learning rate in paper is 0.1, but there is 0.3?

Kyubyong / transformer

A TensorFlow Implementation of the Transformer: Attention Is All You Need

Apache License 2.0

4.28k stars 1.3k forks source link

Closed xiongma closed 5 years ago

xiongma commented 5 years ago

in paper, learning rate is 0.1 suggested, but in your code is 0.3, is any different between your and paper?

xiongma commented 5 years ago

@Kyubyong