Open sethaxen opened 2 years ago
Can you add Yota also ?
Yota is ChainRules-compatible, so it should be covered with the others.
Make sure to add both Enzyme forward and reverse modes!
Will do! Should I start with the public API? @frankschae said you had mentioned we might want to use some internal functions (he pointed me to https://github.com/wsmoses/Enzyme.jl/blob/2ce81ffa8f56c5bf44a4d85234c2110fa9d6eb0a/src/compiler.jl#L1745)
I might not go quite that low level to save yourself some common LLVM setup, but probably using the thunk level (https://github.com/wsmoses/Enzyme.jl/blob/2ce81ffa8f56c5bf44a4d85234c2110fa9d6eb0a/src/compiler.jl#L2700) which has options for "combined" augmented forward pass+gradient, an augmented forward pass (storing values from the original function that need preservation), a standalone gradient (just running the reverse, using the stored values from an augmented forward pass), and forward mode AD.
This is used, for example, to generate the high-level autodiff/fwddiff routines (https://github.com/wsmoses/Enzyme.jl/blob/2ce81ffa8f56c5bf44a4d85234c2110fa9d6eb0a/src/Enzyme.jl#L173) and is currently the highest-level point that exposes "split mode" [e.g. the split augmented forward pass and standalone gradient]
I would like to add a "batch" version of Zygote as a backend which falls back on Zygote except for jacobian
where the pullback is called with all the bases simultaneously (i.e. pb(I)
where I
is the identity matrix). This can be useful to preserve sparsity of Jacobians if all the rules are written in a way that preserves sparsity.
And a SparseDiffTools
backend to optimise for sparsity structure
I would like to add a "batch" version of Zygote as a backend which falls back on Zygote except for
jacobian
where the pullback is called with all the bases simultaneously (i.e.pb(I)
whereI
is the identity matrix). This can be useful to preserve sparsity of Jacobians if all the rules are written in a way that preserves sparsity.
Is this a feature Zygote actually supports, or just something that sometimes works?
It requires that the function being differentiated has independent actions on each column. For example, a neural network satisfies this.
or just something that sometimes works?
Something that sometimes works. The goal is to make it easy to define a sparse Jacobian in a rrule and then get it back when calling Zygote.jacobian
.
Is adding Symbolics.jl
also planned?
In my field (control theory), symbolic differentiation is almost exclusively used since it gives speed when derivatives need to be calculated multiple times due to a lack of overhead of logic from the ADs calculating the forward pass and allocations. This is also the case for machine learning with constant graph, which can also benefit from this when common sub-expression elimination (cse
) from Symbolics.jl
is fully functional.
Calculating the derivative would happen by symbolically tracing the function and generating the derivative/gradient/jacobian function, then passing the inputs to the function.
This is useful when caches are added to this package, for Symbolics.jl
the cache would just be the generated derivative function, resulting in no overhead in calculating the derivative.
I think it would be good to support this. As you say, this would require support for caching. See #41.
Via GitHub advanced search, I found some other AD packages as well:
gdalle/ImplicitDifferentiation.jl
Actually, ImplicitDifferentiation.jl now uses AbstractDifferentiation.jl under the hood, to call any AD package as a backend. Can it be a backend itself? I don't think it's a good idea, so no need to include it on the list :)
We should add backends for the following AD/FD packages: