Open alayamanas opened 3 years ago
Can you reproduce our result on the wmt14 en-de dataset on your hardware and environment?https://github.com/bytedance/lightseq/blob/master/examples/training/fairseq/ls_fairseq_wmt14en2de.sh
Thanks for your reply, I'll have a try
Machine translation English to Chinese, I use the same data, almost the same parms ( only the following exceptions: lightseq --arch ls_transformer --optimizer ls_adam --criterion ls_label_smoothed_cross_entropy; fairseq --arch transformer --optimizer adam --criterion label_smoothed_cross_entropy) But I found that the performence of the lightseq model was worse than fairseq model. lightseq model somtimes produce repeated words again and again, but fairseq model works fine.
What are the possible reasons?