JuliaFirstOrder / ProximalAlgorithms.jl

Proximal algorithms for nonsmooth optimization in Julia
Other
130 stars 21 forks source link

Remove Zygote dependency by switching to AbstractDifferentiation #80

Closed gdalle closed 7 months ago

gdalle commented 2 years ago

Hi there! I was wondering if you'd be interested in a PR that abstracts away the Zygote dependency by adopting the formalism from AbstractDifferentiation?

lostella commented 2 years ago

Hey! I think so, see also the discussion in #71, where this was suggested already. I guess one question is: how would this look from the user perspective? Some kind of wrapper around functions that specializes the backend used? Like passing ZygoteFunction(f) in place of f to have its gradient computed by Zygote?

gdalle commented 2 years ago

That would be one way of doing it, the other being to ask the user to specify the backend as an argument to every algorithm. In any case, I think we can't escape a breaking change

lostella commented 2 years ago

Maybe, maybe not: I see that AbstractDifferentiation depends on ReverseDiff, maybe we could simply have that as default backend? Meaning that, when given a generic function f the gradient operation falls back to using ReverseDiff instead of Zygote.

One option is also to add AbstractDifferentiation as dependency, introduce the mechanism to choose whatever AD backend one wants, but let the default behavior be the current (so, temporarily keep Zygote as dependency).

gdalle commented 2 years ago

AbstractDifferentiation depends on many AD packages in a conditional way, thanks to Requires. That is also something we could envision for ProximalAlgorithms: a conditional dependency on Zygote (only if the user has it in the environment), and a mechanism for selecting other backends

lostella commented 2 years ago

Yes. Then the package would probably not work out of the box in case no AD backend is installed (like in a fresh environment): maybe there is a way to detect this and display a warning when the package is imported.

(In addition to that, the documentation should make this very clear of course)

A similar but unrelated issue is with ProximalOperators: this is now not a hard dependency, but one has an old version installed (< 0.15) then it won’t work with the latest ProximalAlgorithms.

mohamed82008 commented 2 years ago

If you want a default AD backend to be present, I suggest depending on both AbstractDifferentiation and the default AD backend of choice, be it Zygote, ForwardDiff, ReverseDiff, etc. AbstractDifferentiation unifies the interface and allows easy switching between AD backends but you still need to choose and load the default backend you like.

gdalle commented 2 years ago

My initial motivation for this issue was being able to use ProximalAlgorithms without the very heavy Zygote dependency. Would it be possible to have it as a conditional dependency?

mohamed82008 commented 2 years ago

You can have as a conditional dependency if you always tell the users to load some AD package. But that's for the ProximalAlgorithms package authors to decide. They might want to make using ProximalAlgorithms automatically load some AD package.

gdalle commented 2 years ago

Yes that was precisely my question to them :wink:

lostella commented 2 years ago

I think it’s fine to have no AD-backend by default (which would make the package more lightweight on dependencies as @gdalle proposes). In this case, the fallback definition for gradients could raise an error suggesting to install one from a list of supported backends, or to explicitly implement a method for the given type (in case this is not Function).

When some backend is available, there could be a “ranking” to decide which one to use by default, maybe, so that nothing special needs to be wrapped around functions, or specified to algorithms.

(My personal taste leans towards something like ZygoteFunction(f) rather than algorithm(…, ad_backend = :zygote) or similar)

Does this make sense?

mohamed82008 commented 2 years ago

Sounds good! On the AD ranking, I think some users might want to have control over the package used even when multiple are loaded. Not sure if the ZygoteFunction approach will be as easy to use as the algorithm approach, perhaps a PR can make this clear.

lostella commented 1 year ago

@gdalle @mohamed82008 see #85, feedback is appreciated!