Use Optimisers.jl - Githubissues

FluxML / FluxTraining.jl

A flexible neural net training library inspired by fast.ai

https://fluxml.ai/FluxTraining.jl

MIT License

118 stars 25 forks source link

Use Optimisers.jl #112

Closed lorenzoh closed 2 years ago

lorenzoh commented 2 years ago

With Flux.jl 0.13 moving to use the explicit optimisers in Optimisers.jl, I think FluxTraining.jl should also use those as a default.

This would also allow easier integration with alternative ADs like, PyCallChainRules.jl, see https://github.com/rejuvyesh/PyCallChainRules.jl/issues/19.

@ToucheSir can this be done in a backward-compatible way, i.e. supporting Flux v0.12 and below or does Optimisers.jl depend on Flux v0.13?

ToucheSir commented 2 years ago

Yes it can! Optimisers.jl doesn't rely on Flux at all, just Functors.jl.

lorenzoh commented 2 years ago

Looking into what's necessary here; it seems like there is no great way to support both Optimisers.jl and Flux.Optimise in FluxTraining.jl. I'm fine with eventually dropping support for the former, but mean to ask what the path forward is: will Flux.Optimise be deprecated and completely replaced by Optimisers.jl? Also, is there any functionality one would miss out on by dropping Flux.Optimise support?

ToucheSir commented 2 years ago

I think this is doable with some conditional logic, but certainly it will be a little messy. Happy to help with the implementation if you decide to support both.

will Flux.Optimise be deprecated and completely replaced by Optimisers.jl?

That's the plan.

Also, is there any functionality one would miss out on by dropping Flux.Optimise support?

Tied/shared weights aren't yet supported. Arguably they aren't supported in Flux either for any mildly complex optimizer, but going from "incorrect but still works" to "always fails an early validation check" is probably breaking.

darsnack commented 2 years ago

What's the issue you are running into supporting both? We made all the Flux.Optimise ones subtype Flux.Optimise.AbstractOptimiser. The Optimiser.jl ones do not and the purpose of this decision was to be able to distinguish between the two kinds. Does that help at all?