Planned backends to implement

JuliaDiff / AbstractDifferentiation.jl

An abstract interface for automatic differentiation.

https://juliadiff.org/AbstractDifferentiation.jl/

MIT License

137 stars 18 forks source link

Planned backends to implement #40

Open sethaxen opened 2 years ago

sethaxen commented 2 years ago

We should add backends for the following AD/FD packages:

[x] ForwardDiff
[x] ReverseDiff
[x] FiniteDifferences
[ ] all ChainRules-supporting ADs (see #11, #39)
[ ] FiniteDiff
[x] Tracker
[ ] Enzyme (#84)
[ ] Batched Zygote (https://github.com/JuliaDiff/AbstractDifferentiation.jl/issues/40#issuecomment-1029987127)
[ ] SparseDiffTools
[ ] Symbolics

AriMKatz commented 2 years ago

Can you add Yota also ?

sethaxen commented 2 years ago

Yota is ChainRules-compatible, so it should be covered with the others.

wsmoses commented 2 years ago

Make sure to add both Enzyme forward and reverse modes!

sethaxen commented 2 years ago

Will do! Should I start with the public API? @frankschae said you had mentioned we might want to use some internal functions (he pointed me to https://github.com/wsmoses/Enzyme.jl/blob/2ce81ffa8f56c5bf44a4d85234c2110fa9d6eb0a/src/compiler.jl#L1745)

wsmoses commented 2 years ago

I might not go quite that low level to save yourself some common LLVM setup, but probably using the thunk level (https://github.com/wsmoses/Enzyme.jl/blob/2ce81ffa8f56c5bf44a4d85234c2110fa9d6eb0a/src/compiler.jl#L2700) which has options for "combined" augmented forward pass+gradient, an augmented forward pass (storing values from the original function that need preservation), a standalone gradient (just running the reverse, using the stored values from an augmented forward pass), and forward mode AD.

This is used, for example, to generate the high-level autodiff/fwddiff routines (https://github.com/wsmoses/Enzyme.jl/blob/2ce81ffa8f56c5bf44a4d85234c2110fa9d6eb0a/src/Enzyme.jl#L173) and is currently the highest-level point that exposes "split mode" [e.g. the split augmented forward pass and standalone gradient]

mohamed82008 commented 2 years ago

I would like to add a "batch" version of Zygote as a backend which falls back on Zygote except for jacobian where the pullback is called with all the bases simultaneously (i.e. pb(I) where I is the identity matrix). This can be useful to preserve sparsity of Jacobians if all the rules are written in a way that preserves sparsity.

mohamed82008 commented 2 years ago

And a SparseDiffTools backend to optimise for sparsity structure

sethaxen commented 2 years ago

I would like to add a "batch" version of Zygote as a backend which falls back on Zygote except for jacobian where the pullback is called with all the bases simultaneously (i.e. pb(I) where I is the identity matrix). This can be useful to preserve sparsity of Jacobians if all the rules are written in a way that preserves sparsity.

Is this a feature Zygote actually supports, or just something that sometimes works?

ChrisRackauckas commented 2 years ago

It requires that the function being differentiated has independent actions on each column. For example, a neural network satisfies this.

mohamed82008 commented 2 years ago

or just something that sometimes works?

Something that sometimes works. The goal is to make it easy to define a sparse Jacobian in a rrule and then get it back when calling Zygote.jacobian.

JTaets commented 2 years ago

Is adding Symbolics.jl also planned?

In my field (control theory), symbolic differentiation is almost exclusively used since it gives speed when derivatives need to be calculated multiple times due to a lack of overhead of logic from the ADs calculating the forward pass and allocations. This is also the case for machine learning with constant graph, which can also benefit from this when common sub-expression elimination (cse) from Symbolics.jl is fully functional.

Calculating the derivative would happen by symbolically tracing the function and generating the derivative/gradient/jacobian function, then passing the inputs to the function.

This is useful when caches are added to this package, for Symbolics.jl the cache would just be the generated derivative function, resulting in no overhead in calculating the derivative.

sethaxen commented 2 years ago

I think it would be good to support this. As you say, this would require support for caching. See #41.

prbzrg commented 1 year ago

Via GitHub advanced search, I found some other AD packages as well:

gdalle/ImplicitDifferentiation.jl
avigliotti/AD4SM.jl
JuliaDiff/TaylorDiff.jl
abap34/JITrench.jl
sshin23/MadDiff.jl

https://github.com/search?l=&o=desc&q=Automatic+Differentiation+stars%3A%3E10+pushed%3A%3E2022-01-01+language%3AJulia&s=stars&type=Repositories

gdalle commented 1 year ago

gdalle/ImplicitDifferentiation.jl

Actually, ImplicitDifferentiation.jl now uses AbstractDifferentiation.jl under the hood, to call any AD package as a backend. Can it be a backend itself? I don't think it's a good idea, so no need to include it on the list :)