Open y-akbal opened 1 year ago
With the new-style training, I think this should basically just work.
m16 = f16(m32)
makes a low-precision copy of the model, you can use that to compute the gradient g16
, and then update!(opt_state, m32, g16)
will apply this change to the original model.
Although not all operation support Float16, e.g. I'm not sure about convolutions. Maybe there are other un-anticipated problems.
It would be super-nice to have an example of this, e.g. a model zoo page which uses it.
In https://github.com/FluxML/Optimisers.jl/pull/152 I introduce an optimiser handling behind the curtains what @mcabbott said
Xref this example of trying this out:
https://discourse.julialang.org/t/mix-mode-training-of-large-languages-models-in-julia/102090
Great, thank you!
Motivation and description
Just wondering if there is a way to do mixed precision training in Flux?
Possible Implementation
No response