Closed soloice closed 6 years ago
Update: another model, which is trained on 2 GTX 1080 Ti cards and with batch_size 64, achieved a BLEU score of 23.09. I guess the result could be better if I train it longer.
The benchmark is trained on De-En direction which has a higher BLEU score. I think your result is reasonable. This basic RNNsearch model cannot match the result achieved by WMT winners. Their system usually uses other techniques like back-translation, model ensemble and deep models.
I see. I have pretty much experience in training NMT models with Theano, and have tried out a lot more modern techniques such like sequence-level knowledge distillation. I also implemented an encoder-decoder model with attention mechanism in pure C++ and deployed it in cellphones. I'm just looking for an NMT framework in TensorFlow (because Theano is not maintained any more, lol~) and found this one. So I'm shocked by the BLEU score reported in readme.md
.
The instruction in the readme.md
is for training English->German models, but the benchmark result is for German->English, which is inconsistent. I suggest you to update it to make it less confusing, i.e.: by including my result on En->De translation.
For my experiments, the RNNSearch model trained for 75k steps on 2 GTX 1080 Ti cards with batch_size=64 achieved a BLEU score of 0.2309 and an NIST score of 6.7258. Continuing training it for 75k more steps (150k steps in total) leads to a BLEU score of 0.2348 and an NIST score of 6.7717.
The instructions in readme.md describes how to train an English-to-German translation model and apply it on test data. But how did you evaluate the result?
This is what I did:
I used the default hyper-parameters to train the model (except for batch_size=80), and got a BLEU of 22.47 only:
What could be wrong? And according to my experience in NMT, a BLEU score of 30 is kind of high for English-to-German translation system on newstest2017 data. For example, in this work, the English -> German system just got a BLEU of <26. And the winner of WMT'17 only get a BLEU of 28.3, see http://matrix.statmt.org/.