Add ChainRules rules - Githubissues

SciML / ExponentialUtilities.jl

Fast and differentiable implementations of matrix exponentials, Krylov exponential matrix-vector multiplications ("expmv"), KIOPS, ExpoKit functions, and more. All your exponential needs in SciML form.

https://docs.sciml.ai/ExponentialUtilities/stable/

Other

93 stars 29 forks source link

Add ChainRules rules #40

Open sethaxen opened 4 years ago

sethaxen commented 4 years ago

From Slack: @sethaxen:

Does ExponentialUtilities.jl play well with AD packages, in particular Zygote?

@ChrisRackauckas:

not fully with Zygote it'll need adjoints since it's doing a lot of scalar stuff it's writing the kernels directly the adjoints are easy though

I in particular need adjoints for expv. Zygote currently has an adjoint rule for exp(::AbstractMatrix) and exp(::Hermitian) using the eigendecomposition. I imagine though there's a better way to implement the adjoints for expv by looking at the underlying algorithm (I have not).

ChrisRackauckas commented 4 years ago

There should be ways to do this without defining the Jacobian. expv is the solution to the linear ODE, so the adjoint of the ODE can be should be able to be used to derive the expression in terms of the adjoint, which IIRC should just be:

du = Au -> dlambda -> lambda'*A

which means the adjoint should just be expv(A,delta').

sethaxen commented 4 years ago

For λ = expv(t, A, u) and adjoint of λ, Δλ, the adjoint of u should I think be ∂u = expv(t, A', Δλ), which is quite nice. We also need the adjoints ∂t and ∂A, which will take more thought.

sethaxen commented 4 years ago

Especially since A doesn't even need to be a matrix, right? I don't think we'll be able to support all types of A for a custom adjoint, just AbstractMatrixes.

ChrisRackauckas commented 4 years ago

Yeah, the difficult thing will be supporting something that's not concrete, since then it can't adjoint. But then that's just defined as the reverse mode of the function f(u) = A*u, so I think it can work out, it'll just be more complicated in code.

Those again would come from this derivation. You might want to read https://diffeq.sciml.ai/stable/extras/sensitivity_math/ or the supplemental of https://arxiv.org/abs/2001.04385 . Specifically, the ∂A term is given by an integral over the Legrange multiplier term. Coincidentally, the phiv values used in the exponential integrators are these integrals, so the adjoint can probably be written as just a calculation of phi_1. I think it's like phiv(t, A', Δλ) + reversemode(A) kind of thing (in pseudocode, off the top of my head so maybe missing a detail somewhere).

∂t is easy in this interpretation: λ = expv(t, A, u) = exp(tA)u is equal to λ=Aλ where λ(0)=u and solve to t, so the derivative of the solution w.r.t. t is just A (or in reverse-mode, maybe A').

Again, all might be missing a detail since I'm doing it quickly, but that should be the gist of it.

sethaxen commented 4 years ago

Thanks! That should be enough to get me started. I'll probably tackle this in a few months if no one else does before then (unless I find some time early).