Relevance of BERT for NMT

google-research / bert

TensorFlow code and pre-trained models for BERT

https://arxiv.org/abs/1810.04805

Apache License 2.0

38.14k stars 9.6k forks source link

Relevance of BERT for NMT #697

Open valentinmace opened 5 years ago

valentinmace commented 5 years ago

Since BERT is based on Transformer architecture, is there any reason to use BERT embeddings for a NMT model that is already a transformer ?

My take is that BERT embeddings are trained on a very large corpus, they may bring better information than embeddings that are trained at the same time as my NMT model on my little parallel corpus

bhack commented 5 years ago

https://openreview.net/forum?id=Hyl7ygStwB