bytedance / lightseq

LightSeq: A High Performance Library for Sequence Processing and Generation
Other
3.22k stars 329 forks source link

Model trained by lightseq perform worse than model trained by fairseq #187

Open alayamanas opened 3 years ago

alayamanas commented 3 years ago

Machine translation English to Chinese, I use the same data, almost the same parms ( only the following exceptions: lightseq --arch ls_transformer --optimizer ls_adam --criterion ls_label_smoothed_cross_entropy; fairseq --arch transformer --optimizer adam --criterion label_smoothed_cross_entropy) But I found that the performence of the lightseq model was worse than fairseq model. lightseq model somtimes produce repeated words again and again, but fairseq model works fine.

What are the possible reasons?

neopro12 commented 3 years ago

Can you reproduce our result on the wmt14 en-de dataset on your hardware and environment?https://github.com/bytedance/lightseq/blob/master/examples/training/fairseq/ls_fairseq_wmt14en2de.sh

alayamanas commented 3 years ago

Thanks for your reply, I'll have a try