Difference of Adam betas setting with tensor2tensor

facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

MIT License

30.43k stars 6.4k forks source link

Hi guys,

In en-de and en-fr transformer-big examples, adam_betas is set to (0.9, 0.98), which is the setting used in the Vaswani et al. 2017 paper. However, in the latest tensor2tensor repo, Adam betas is set to (0.9, 0.997), which is also closer to the default settings of Adam (0.9, 0.999). https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py#L1489

Have you experimented with the effect of different Adam betas settings on the final result, especially with large batch training (e.g. update_freq=16)?

Da Xiao

facebookresearch / fairseq

Difference of Adam betas setting with tensor2tensor #343