Inconsistency in value_and_pushforward_function vs value_and_pullback_function outputs

JuliaDiff / AbstractDifferentiation.jl

An abstract interface for automatic differentiation.

https://juliadiff.org/AbstractDifferentiation.jl/

MIT License

135 stars 18 forks source link

Inconsistency in value_and_pushforward_function vs value_and_pullback_function outputs #119

Closed prbzrg closed 8 months ago

prbzrg commented 11 months ago

value_and_pushforward_function returns a function that returns (value, pf) https://github.com/JuliaDiff/AbstractDifferentiation.jl/blob/211b67528c5ed91971bb524c57adb63837163367/src/AbstractDifferentiation.jl#L259 but value_and_pullback_function returns (value, pullback_function) https://github.com/JuliaDiff/AbstractDifferentiation.jl/blob/211b67528c5ed91971bb524c57adb63837163367/src/AbstractDifferentiation.jl#L319

devmotion commented 10 months ago

I'm not sure there's anything to be fixed here. In forward mode usually you get both the primal and the dual together vs in reverse mode you get the value and the pullback function (see, e.g., ForwardDiff or ChainRulesCore.frule vs Zygote or ChainRulesCore.rrule).

prbzrg commented 10 months ago

IMO, having a backend-agnostic interface means we shouldn't rely on backend APIs or mirror them. Moreover, every other value_and_AD APIs in AbstractDifferentiation.jl return value and AD.

prbzrg commented 10 months ago

And why do we return the same value every time?

devmotion commented 10 months ago

IMO the names are possibly confusing but not wrong per se - value_and_X_function can be read both as (value_and_X)_function and value_and_(X_function). For forward mode we use the former interpretation and for reverse mode the latter.

AFAICT this difference between forward and reverse mode is not based on a specific package but a conceptual difference: In forward mode, you can (and usually do) compute both the primal value and the partial derivatives in a single forward pass whereas in reverse mode you compute the primal value in the forward pass, construct the pullback functions as you go, and then compute the derivatives in the backward pass.

prbzrg commented 10 months ago

Thanks for your response.

gdalle commented 8 months ago

The docstring is misleading then, see #124

That might be my fault from the recent docstring overhaul