Tuple returns: forward vs reverse

gdalle commented 5 days ago

Following our conversation on Slack with @ExpandingMan, just wanted to ask whether the return tuples would be modified or not. Let's take a function y = f(x1, x2, x3) and run autodiff on it.

Current behavior

	Forward	Reverse
With primal	`(y, dy)`	`((dx1, dx2, dx3), y)`
Without primal	`(dy,)`	`((dx1, dx2, dx3),)`

Suggestions

Unnest the tuple of derivatives when primal is not needed
Make order of primal and derivatives coherent between forward and reverse mode

	Forward	Reverse
With primal	`(y, dy)`	`(y, (dx1, dx2, dx3))`
Without primal	`dy`	`(dx1, dx2, dx3)`

ExpandingMan commented 5 days ago

If these are going to be the same function, I would weakly prefer keeping the ordering the same (i.e. ((dx1, dx2), y)) and also keeping the wrapping without the primal, that way e.g. gradient(stuff...)[1] always gives the same result (a tuple of gradients of the arguments.

However, I'm still skeptical of the premise that these should all be the same function, I feel it's trying a bit too hard to be generic and not really achieving it. I favor what DI does in making separate functions (value_and_gradient, or whatever it would be called) and, in that case, not wrapping the output in the bare gradient case. Roughly speaking, I usually favor allowing type arguments to change the type of the output if e.g. that output is a single struct, especially if that struct is of the type given as an argument, but separate functions if the tree structure of the output is different. An example would be giving StaticArray or Array as an argument: the outputs would have the same interface, so using them as an argument seems fine, but having incompatible user-facing nested structures not so much. In my mind it therefore makes more sense for gradient and jacobian to be the same function than for gradient and value_and_gradient (I've called it valgradient) to be the same function. I acknowledge that this argument is ultimately based on mere personal preference, but I think it's roughly what most Julia packages tend to do in most cases.

wsmoses commented 5 days ago

This behavior has already changed in main with https://github.com/EnzymeAD/Enzyme.jl/pull/1832

In particular autodiff now has semantics of:

	Forward	Reverse
With primal	`(dy, y)`	`((dx1, dx2, dx3), y)`
Without primal	`(dy,)`	`((dx1, dx2, dx3),)`

wsmoses commented 5 days ago

This https://github.com/EnzymeAD/Enzyme.jl/pull/1844 changes gradient/jacobian whose behavior is now:

	Forward	Reverse
With primal	`((darg1,darg2,darg3), y)`	`((darg1,darg2,darg3), y)`
Without primal	`(darg1,darg2,darg3)`	`(darg1,darg2,darg3)`

EnzymeAD / Enzyme.jl

Tuple returns: forward vs reverse #1847