Closed lorenzoh closed 2 years ago
Yes it can! Optimisers.jl doesn't rely on Flux at all, just Functors.jl.
Looking into what's necessary here; it seems like there is no great way to support both Optimisers.jl and Flux.Optimise in FluxTraining.jl. I'm fine with eventually dropping support for the former, but mean to ask what the path forward is: will Flux.Optimise be deprecated and completely replaced by Optimisers.jl? Also, is there any functionality one would miss out on by dropping Flux.Optimise support?
I think this is doable with some conditional logic, but certainly it will be a little messy. Happy to help with the implementation if you decide to support both.
will Flux.Optimise be deprecated and completely replaced by Optimisers.jl?
That's the plan.
Also, is there any functionality one would miss out on by dropping Flux.Optimise support?
Tied/shared weights aren't yet supported. Arguably they aren't supported in Flux either for any mildly complex optimizer, but going from "incorrect but still works" to "always fails an early validation check" is probably breaking.
What's the issue you are running into supporting both? We made all the Flux.Optimise
ones subtype Flux.Optimise.AbstractOptimiser
. The Optimiser.jl ones do not and the purpose of this decision was to be able to distinguish between the two kinds. Does that help at all?
With Flux.jl 0.13 moving to use the explicit optimisers in Optimisers.jl, I think FluxTraining.jl should also use those as a default.
This would also allow easier integration with alternative ADs like, PyCallChainRules.jl, see https://github.com/rejuvyesh/PyCallChainRules.jl/issues/19.
@ToucheSir can this be done in a backward-compatible way, i.e. supporting Flux v0.12 and below or does Optimisers.jl depend on Flux v0.13?