A masked language modeling objective to train a model to predict any subset of the target words, conditioned on both the input text and a partially masked target translation.
Other
240
stars
38
forks
source link
Does the En-De experiment use the compound-split-bleu.sh? #14
I've trained a Transformer-base on WMT14 En-De following your settings. It would only be around ~27.74 when using compound-split-bleu.sh. I am not sure about this since the paper only said "evaluated with BLEU".
I've trained a Transformer-base on WMT14 En-De following your settings. It would only be around ~27.74 when using
compound-split-bleu.sh
. I am not sure about this since the paper only said "evaluated with BLEU".