SciML / ADTypes.jl

Repository for automatic differentiation backend types
https://sciml.github.io/ADTypes.jl/
MIT License
36 stars 11 forks source link

What is the right way to specify the Enzyme mode? #24

Closed gdalle closed 3 months ago

gdalle commented 4 months ago

Question❓

The source file says nothing, the test file recommends something but does something else:

# In practice, you would rather specify a
# `mode::Enzyme.Mode`, e.g. `Enzyme.Reverse` or `Enzyme.Forward`
adtype = AutoEnzyme(; mode = Val(:Reverse))
@test adtype isa ADTypes.AbstractADType
@test adtype isa AutoEnzyme{Val{:Reverse}}

What convention is used in the SciML ecosystem? Can it be documented?

Vaibhavdixit02 commented 4 months ago

I haven't been using the mode kwarg, idk if @avik-pal uses it. For me in Optimization.jl it is just reverse mode for gradient and forward over reverse for hessian so not well suited for the mode argument.

avik-pal commented 4 months ago

Me neither I kind of always assume it's reverse mode (which I probably shouldn't)

gdalle commented 4 months ago

Thanks! Since you're maintaining packages that make use of ADTypes.jl, I'd be glad to have your opinion on https://github.com/gdalle/DifferentiationInterface.jl. The gist is "AbstractDifferentiation.jl + ADTypes.jl"

gdalle commented 4 months ago

Another question is what to put inside the AutoEnzyme object, beyond the "mode". Do we need more information to make informed decisions?

wsmoses commented 4 months ago

I feel that here it would actually be better for ADTypes to just re export enzyme.reverse/etc when packages have these (they are defined in enzymecore which contains no dependencies and just define enzyme interfaces).

I think the fact that this is optional is a rather important misnomer. You wouldn't expect autoforwarddiff to call reverse mode.jl

Also just like forward mode has chunking enzyme does as well so there's an ambiguous question of what chunk type you're doing.****

Vaibhavdixit02 commented 4 months ago

Since you're maintaining packages that make use of ADTypes.jl, I'd be glad to have your opinion on https://github.com/gdalle/DifferentiationInterface.jl.

It looks great! Some other features already being discussed in https://github.com/gdalle/DifferentiationInterface.jl/issues/26, caching etc and support for sparsity will also be pretty essential for me to consider using it in Optimization.jl

avik-pal commented 4 months ago

I think the fact that this is optional is a rather important misnomer

I feel nothing is not a bad default. Whenever we keep a field as nothing (at least in NonlinearSolve and such repos), the assumption there is use the best possible heuristic. Let's say you have a nested situation then we can do forward over reverse, for a small system we do a forward, and so on...

Having said that the current form does make it hard to subtype it as forward / reverse mode. I think we should split it up into AutoForwardEnzyme / AutoReverseEnzyme. My main issue with reexporting EnzymeCore is not being able to subtype them as Abstract(Reverse/Forward)

gdalle commented 4 months ago

@avik-pal There's also another issue with using Enzyme.Reverse as mode, which is that depending on the function (like "value_and_gradient") you may want Enzyme.ReverseWithPrimal instead

gdalle commented 4 months ago

@Vaibhavdixit02 indeed! Cache support is fully there for ForwardDiff and ReverseDiff (with and without compiled tape) already, with a unified interface. FiniteDiff and SparseDiffTools are next on my hit list

avik-pal commented 4 months ago

There's also another issue with using Enzyme.Reverse as mode, which is that depending on the function (like "value_and_gradient") you may want Enzyme.ReverseWithPrimal instead

Agreed