Closed gdalle closed 5 days ago
If these are going to be the same function, I would weakly prefer keeping the ordering the same (i.e. ((dx1, dx2), y)
) and also keeping the wrapping without the primal, that way e.g. gradient(stuff...)[1]
always gives the same result (a tuple of gradients of the arguments.
However, I'm still skeptical of the premise that these should all be the same function, I feel it's trying a bit too hard to be generic and not really achieving it. I favor what DI does in making separate functions (value_and_gradient
, or whatever it would be called) and, in that case, not wrapping the output in the bare gradient
case. Roughly speaking, I usually favor allowing type arguments to change the type of the output if e.g. that output is a single struct, especially if that struct is of the type given as an argument, but separate functions if the tree structure of the output is different. An example would be giving StaticArray
or Array
as an argument: the outputs would have the same interface, so using them as an argument seems fine, but having incompatible user-facing nested structures not so much. In my mind it therefore makes more sense for gradient
and jacobian
to be the same function than for gradient
and value_and_gradient
(I've called it valgradient
) to be the same function. I acknowledge that this argument is ultimately based on mere personal preference, but I think it's roughly what most Julia packages tend to do in most cases.
This behavior has already changed in main with https://github.com/EnzymeAD/Enzyme.jl/pull/1832
In particular autodiff now has semantics of:
Forward | Reverse | |
---|---|---|
With primal | (dy, y) |
((dx1, dx2, dx3), y) |
Without primal | (dy,) |
((dx1, dx2, dx3),) |
This https://github.com/EnzymeAD/Enzyme.jl/pull/1844 changes gradient/jacobian whose behavior is now:
Forward | Reverse | |
---|---|---|
With primal | ((darg1,darg2,darg3), y) |
((darg1,darg2,darg3), y) |
Without primal | (darg1,darg2,darg3) |
(darg1,darg2,darg3) |
Following our conversation on Slack with @ExpandingMan, just wanted to ask whether the return tuples would be modified or not. Let's take a function
y = f(x1, x2, x3)
and runautodiff
on it.Current behavior
(y, dy)
((dx1, dx2, dx3), y)
(dy,)
((dx1, dx2, dx3),)
Suggestions
(y, dy)
(y, (dx1, dx2, dx3))
dy
(dx1, dx2, dx3)