Closed AranKomat closed 5 years ago
Tensor2Tensor has AdaFactor to drastically reduce the GPU memory usage. I believe it would be helpful for FairSeq to have this by default.
Good idea!
Working on this
See Adafactor here : https://github.com/pytorch/fairseq/blob/master/fairseq/optim/adafactor.py
Tensor2Tensor has AdaFactor to drastically reduce the GPU memory usage. I believe it would be helpful for FairSeq to have this by default.