EnzymeAD / Enzyme.jl

Julia bindings for the Enzyme automatic differentiator
https://enzyme.mit.edu
MIT License
439 stars 62 forks source link

Tuple returns: forward vs reverse #1847

Closed gdalle closed 5 days ago

gdalle commented 5 days ago

Following our conversation on Slack with @ExpandingMan, just wanted to ask whether the return tuples would be modified or not. Let's take a function y = f(x1, x2, x3) and run autodiff on it.

Current behavior

Forward Reverse
With primal (y, dy) ((dx1, dx2, dx3), y)
Without primal (dy,) ((dx1, dx2, dx3),)

Suggestions

Forward Reverse
With primal (y, dy) (y, (dx1, dx2, dx3))
Without primal dy (dx1, dx2, dx3)
ExpandingMan commented 5 days ago

If these are going to be the same function, I would weakly prefer keeping the ordering the same (i.e. ((dx1, dx2), y)) and also keeping the wrapping without the primal, that way e.g. gradient(stuff...)[1] always gives the same result (a tuple of gradients of the arguments.

However, I'm still skeptical of the premise that these should all be the same function, I feel it's trying a bit too hard to be generic and not really achieving it. I favor what DI does in making separate functions (value_and_gradient, or whatever it would be called) and, in that case, not wrapping the output in the bare gradient case. Roughly speaking, I usually favor allowing type arguments to change the type of the output if e.g. that output is a single struct, especially if that struct is of the type given as an argument, but separate functions if the tree structure of the output is different. An example would be giving StaticArray or Array as an argument: the outputs would have the same interface, so using them as an argument seems fine, but having incompatible user-facing nested structures not so much. In my mind it therefore makes more sense for gradient and jacobian to be the same function than for gradient and value_and_gradient (I've called it valgradient) to be the same function. I acknowledge that this argument is ultimately based on mere personal preference, but I think it's roughly what most Julia packages tend to do in most cases.

wsmoses commented 5 days ago

This behavior has already changed in main with https://github.com/EnzymeAD/Enzyme.jl/pull/1832

In particular autodiff now has semantics of:

Forward Reverse
With primal (dy, y) ((dx1, dx2, dx3), y)
Without primal (dy,) ((dx1, dx2, dx3),)
wsmoses commented 5 days ago

This https://github.com/EnzymeAD/Enzyme.jl/pull/1844 changes gradient/jacobian whose behavior is now:

Forward Reverse
With primal ((darg1,darg2,darg3), y) ((darg1,darg2,darg3), y)
Without primal (darg1,darg2,darg3) (darg1,darg2,darg3)