Model trained by lightseq perform worse than model trained by fairseq

alayamanas commented 3 years ago

Machine translation English to Chinese, I use the same data, almost the same parms ( only the following exceptions: lightseq --arch ls_transformer --optimizer ls_adam --criterion ls_label_smoothed_cross_entropy; fairseq --arch transformer --optimizer adam --criterion label_smoothed_cross_entropy) But I found that the performence of the lightseq model was worse than fairseq model. lightseq model somtimes produce repeated words again and again, but fairseq model works fine.

What are the possible reasons?

neopro12 commented 3 years ago

Can you reproduce our result on the wmt14 en-de dataset on your hardware and environment？https://github.com/bytedance/lightseq/blob/master/examples/training/fairseq/ls_fairseq_wmt14en2de.sh

alayamanas commented 3 years ago

Thanks for your reply, I'll have a try

bytedance / lightseq

Model trained by lightseq perform worse than model trained by fairseq #187