facebookresearch / dadaptation

D-Adaptation for SGD, Adam and AdaGrad
MIT License
501 stars 19 forks source link

Float 16? #6

Closed TKassis closed 1 year ago

TKassis commented 1 year ago

Just to confirm, these optimizes don't support 16 bit precision training yet, correct?

adefazio commented 1 year ago

They don't have native support. I've used them within fairseq (which provides float16 support by wrapping the optimizer) in some of my experiments.