Kyubyong / transformer

A TensorFlow Implementation of the Transformer: Attention Is All You Need
Apache License 2.0
4.28k stars 1.3k forks source link

learning rate in paper is 0.1, but there is 0.3? #112

Closed xiongma closed 5 years ago

xiongma commented 5 years ago

in paper, learning rate is 0.1 suggested, but in your code is 0.3, is any different between your and paper?

xiongma commented 5 years ago

@Kyubyong