facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.38k stars 6.4k forks source link

English to Chinese translation training tutorials #274

Closed lucasjinreal closed 5 years ago

lucasjinreal commented 6 years ago

Hi, is there any resources or tutorial abouth translate Chinese to English?

edunov commented 6 years ago

Hi @jinfagang I'm not aware of any tutorials, but from my experience, it is not very different from other language pairs. Depending on dataset size transformer_vaswani_wmt_en_de_big or transformer_iwslt_de_en work very well. The only big difference is preprocessing, I used jieba tokenizer to tokenize data: https://github.com/fxsjy/jieba and then I applied BPE encoding (learning different vocabularies for En and Zh) and standard preprocess.py without --joined-dictionary

Does it make sense?

lucasjinreal commented 6 years ago

@edunov Thanks, the big question is about hyper parameters. I'll take a dive into that

benbijituo commented 5 years ago

@edunov Thanks, the big question is about hyper parameters. I'll take a dive into that

I'm not sure whether my answer is useful for anyone coming later. For the hyper parameters of en-zh translation model, you can refer to the setup in this paper http://arxiv.org/abs/1803.05567.