Kyubyong / transformer

A TensorFlow Implementation of the Transformer: Attention Is All You Need
Apache License 2.0
4.28k stars 1.3k forks source link

What Makes The Result Rise From 17.7 to 22.4 In Comparison with The Previous Version? #98

Open yaoyiran opened 5 years ago

yaoyiran commented 5 years ago

Do you know what are the key factors that makes the BLEU score rise from 17.7 (the previous version in TF1.2) to 22.5 (the current version/master branch)? If I want to develop my model based on the previous version, how should I modify the code?

Seems that the author clarifies that two major differences are (1)revising known bugs. (masking, positional encoding, ...) (2)adding some missing components (bpe, shared weight matrix, ...). But I have checked that the masking part seems to be the same as it was in the provious version. I believe that adding bpe is very helpful. Could anyone clarify what are other key factors that improve the result, if you have conducted some experiments on that?