facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.43k stars 6.4k forks source link

AdaFactor to save GPU memory? #281

Closed AranKomat closed 5 years ago

AranKomat commented 6 years ago

Tensor2Tensor has AdaFactor to drastically reduce the GPU memory usage. I believe it would be helpful for FairSeq to have this by default.

myleott commented 6 years ago

Good idea!

luciodery commented 5 years ago

Working on this

luciodery commented 5 years ago

See Adafactor here : https://github.com/pytorch/fairseq/blob/master/fairseq/optim/adafactor.py