Mixed precision training

FluxML / Flux.jl

Relax! Flux is the ML library that doesn't make you tensor

https://fluxml.ai/

Other

4.48k stars 605 forks source link

Mixed precision training #2291

Open y-akbal opened 1 year ago

y-akbal commented 1 year ago

Motivation and description

Just wondering if there is a way to do mixed precision training in Flux?

Possible Implementation

No response

mcabbott commented 1 year ago

With the new-style training, I think this should basically just work.

m16 = f16(m32) makes a low-precision copy of the model, you can use that to compute the gradient g16, and then update!(opt_state, m32, g16) will apply this change to the original model.

Although not all operation support Float16, e.g. I'm not sure about convolutions. Maybe there are other un-anticipated problems.

It would be super-nice to have an example of this, e.g. a model zoo page which uses it.

CarloLucibello commented 1 year ago

In https://github.com/FluxML/Optimisers.jl/pull/152 I introduce an optimiser handling behind the curtains what @mcabbott said

mcabbott commented 1 year ago

Xref this example of trying this out:

https://discourse.julialang.org/t/mix-mode-training-of-large-languages-models-in-julia/102090

y-akbal commented 1 year ago

Great, thank you!