facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.57k stars 6.41k forks source link

long-standing Bug in Adafactor optimizer if beta1 > 0 #5561

Open dxqbYD opened 3 weeks ago

dxqbYD commented 3 weeks ago

There seems to be an issue with the Adafactor optimizer found here, if beta1 is > 0: https://github.com/facebookresearch/fairseq/blob/ecbf110e1eb43861214b05fa001eff584954f65a/fairseq/optim/adafactor.py#L66

Please find a detailed description here: https://github.com/huggingface/transformers/issues/34506