Open QiyaoHuang opened 2 years ago
When I use the dataset wmt14en-de ,I got the bleu score:24.5,which is just like the paper's score, but when I use the same way to train the model with Wmt17 zh-en,the bleu score is only 7.0.
the dataset Wmt17 zh-en: http://data.statmt.org/wmt17/translation-task/training-parallel-nc-v12.tgz", ["training/news-commentary-v12.zh-en.en", "training/news-commentary-v12.zh-en.zh"]]] why how can I do ?
How do you tokenize the Chinese corpus?
你如何标记中文语料库? 使用本项目模板例子里提供的tokenize方式,和我在wmt14en-de上做法相同
When I use the dataset wmt14en-de ,I got the bleu score:24.5,which is just like the paper's score, but when I use the same way to train the model with Wmt17 zh-en,the bleu score is only 7.0.
the dataset Wmt17 zh-en: http://data.statmt.org/wmt17/translation-task/training-parallel-nc-v12.tgz", ["training/news-commentary-v12.zh-en.en", "training/news-commentary-v12.zh-en.zh"]]] why how can I do ?