JuliaDiff / ForwardDiff.jl

Forward Mode Automatic Differentiation for Julia
Other
892 stars 148 forks source link

support for `Base.expm` (need advice implementing) #174

Open tpapp opened 7 years ago

tpapp commented 7 years ago

It would be very nice to have support for AD in Base.expm. I would like to contribute to it, but since I am unfamiliar with the AD architecture in Julia I would appreciate some advice.

For Frechet derivatives, the recommended approach is Al-Mohy and Higham (2009). But I am unsure that this is the best approach for AD.

I came across an article by Mike Giles on AD and matrix functions. Section 2.3.5 in this paper suggests that if one uses the squaring and scaling algorithm, AD calculations just amount to the solution of linear systems and matrix powers, both of which go through nicely.

Base.expm! uses Julia functions, except for two LAPACK ones, gebal! and gesv!, for balancing and solving linear systems, respectively. I am under the impression that AD already goes through linear systems, so it could be replaced. So all one would need to do is implement gebal in Julia, and replace gesv! by \, perhaps at the cost of some extra allocation.

Am I on the right track with this?

jrevels commented 7 years ago

I've not explored implementing AD for expm, so I'm not sure how much help I can be, but it looks like you're on the right track. The ability to implement matrix derivatives in ForwardDiff will probably require #165.

It might be a better idea to start with ReverseDiff, whose implementation is currently more natural for implementing matrix derivatives. For example, many of the linear algebra derivative definitions in ReverseDiff stem from the Giles paper you linked.

tpapp commented 7 years ago

@jrevels: Thanks. Looking at ReverseDiff, the key to supporting a calculation again seems to be having it in pure Julia. So by providing an alternative implementation for expm that does not use LAPACK, I could get both ReverseDiff and ForwardDiff to work, is that correct?

tpapp commented 7 years ago

update: I confirm that if one removes balancing/rebalancing and replaces LAPACK.gesv! by \, you get a version of expm that works fine with ForwardDiff. Probably works with ReverseDiff, but I need to figure that out.

Balancing is simply taking an input matrix A and coming up with a scaling/permutation matrix D such that A=DBD^{-1} where B is more balanced. The scaling would depend on A continuously, while of course the permutation depends on A discontinuously.

My hunch would be to shut down the derivatives for D, ie compute it using the real part of A, and then proceed with the rescaling.

Two questions:

  1. what's the recommended way of extracting the real part? is it map(x->x.value, matrix)?
  2. does ForwardDiff always replace a matrix with AbstractMatrix{ForwardDiff.Dual}, so it would be OK if I defined a method for expm with that signature, or can it be something else?
jrevels commented 7 years ago

So by providing an alternative implementation for expm that does not use LAPACK, I could get both ReverseDiff and ForwardDiff to work, is that correct?

That is correct. If you have a generic, pure Julia implementation of expm, ReverseDiff and ForwardDiff will hopefully both be able to differentiate it.

what's the recommended way of extracting the real part? is it map(x->x.value, matrix)?

map(ForwardDiff.value, matrix) should work.

does ForwardDiff always replace a matrix with AbstractMatrix{ForwardDiff.Dual}, so it would be OK if I defined a method for expm with that signature, or can it be something else?

Currently, it's advisable to only overload scalar functions on Dual numbers, not matrix functions on arrays of Dual numbers. #165 is attempting to provide an interface that enables the latter - it can be pretty tricky to manage memory correctly because of ForwardDiff's chunk-mode. Thus, #165 has to be figured out before it will be possible to manually implement the Jacobian for expm on Dual arrays.

If you'd like to work on this in the meantime, it will be significantly easier to implement expm in ReverseDiff than in ForwardDiff. Additionally, a ReverseDiff implementation should be much faster for medium-to-large matrices (probably for about size(m, i) > 100).

jrevels commented 7 years ago

I might be confused about which thing you're trying to do here: are you seeking to implement a manual Jacobian for expm that can be "plugged into" ForwardDiff/ReverseDiff, or are you seeking to implement a generic expm such that ForwardDiff/ReverseDiff will be able to differentiate it?

My previous advice was assuming the former; for the latter, I believe it's sufficient just to implement a method of expm for AbstractMatrix{T<:Real}.

jebej commented 7 years ago

Hi, I am also interested in autodifferentiation with matrix exponentials. How can only replacing gebal! and gesv! give pure julia code? What about all the matrix multiplications?

sdewaele commented 5 years ago

For this purpose I have converted a pure Julia implementation to Julia 1.0, see here. It passes all the matrix exponential tests, but I have not yet tested it in autodiff. Would it be useful to have a pure Julia implementation in Base for this type of applications?

JeffFessler commented 2 years ago

A PhD student in my group adapted the gist code provided by @sdewaele into this package where we use ForwardDiff to autodifferentiate matrix exponentials: https://github.com/StevenWhitaker/BlochSim.jl/blob/main/src/expm.jl However, the original code was not quite directly suitable for autodiff because it uses the frexp function that is discontinuous (it is piecewise linear). This function arises due to the power of 2 scaling that is used for exp(Matrix). So he overloaded the derivative of it to provide the correct derivatives in the linear regions. (See end of the expm.jl file.)

Trying to autodifferentiate the code for exp(A) is analogous to trying to autodifferentiate the code for sin(x) instead of using the fact that the derivative of sin(x) is cos(x). Autodiff packages use cos(x) instead, knowing that cos(x) is computed approximately. ForwardDiff would benefit from a similar analytical form for the Jacobian of exp(A). A recent paper https://doi.org/10.1016/j.jedc.2021.104122 gives (in Thm. 2) a convenient expression for that Jacobian for the case where A is not defective, which is probably the main use case. (Thm. 1 gives the more general case.)

Here is the remarkably simple Julia code:

using LinearAlgebra
"""
    jacobian_expm(A)
Jacobian of non-defective (e.g., normal) matrix A
Method of Theorem 2 of Magnus et al. 2021 (magnus:21:tjo)
See https://doi.org/10.1016/j.jedc.2021.104122
"""
function jacobian_expm(A)
    (d, T) = eigen(A)
    Tinv = inv(T)
    A ≈ T * Diagonal(d) * Tinv || throw("defective")
    deltafun(a,b) = a == b ? exp(a) : (exp(a) - exp(b)) / (a - b)
    delta = deltafun.(d, transpose(d))
    S = kron(Tinv', T)
    S = kron(transpose(Tinv), T)
    return S * Diagonal(vec(delta)) * inv(S)
end

And here is test code that validates the analytical Jacobian versus the empirical finite difference perturbation (not sure if the validation is complete in the complex case BTW):

using Random: seed!

# empirical jacobian via finite differences
# todo: what about complex case?  real perturbation seems incomplete...
function jacobian_empirical(A::Matrix{<:Number}; ep::Float64=1e-6)
    n = size(A,1)
    T = promote_type(eltype(A), Float64)
    A = T.(A)
    expA = exp(A)
    Je = zeros(T, n^2, n^2)
    for i=1:n, j=1:n
        B = copy(A)
        B[i,j] += ep
        Je[:,(j-1)*n+i] = vec((exp(B) - expA) / ep)
    end
    return Je
end

n = 6
T = Float64
T = ComplexF64

for trial = 1:10^4
    seed!(trial)
    A = rand(T, n, n)

    Ja = jacobian_expm(A) # analytical
    if Ja ≈ real(Ja)
        Ja = real(Ja)
    end
    Je = jacobian_empirical(A; ep=1e-6) #empirical

    err = maximum(abs.(Je - Ja))
    if err > 3e-6
        err = round(err / 1e-5, digits=2)
        @show trial, err, cond(A)
    end
end

I have not had time yet to compare the run time of this approach vs the approach in expm.jl mentioned above. Would there be interest in adding the functionality to ForwardDiff.jl ?

Or does it belong somewhere more general like https://github.com/JuliaDiff/DiffRules.jl ?

gdalle commented 2 years ago

For future reference, @sethaxen implemented an FD-compatible matrix exponential in https://github.com/sethaxen/ExponentialAction.jl

sethaxen commented 2 years ago

The paper linked in the OP gives the Frechet derivative as the same derivative you would get if you naively ran a forward mode AD on the same matrix exponential function implemented in Base. We've implemented this function in ChainRules. https://github.com/JuliaDiff/ChainRules.jl/blob/2cc27e2124fc7746afdd7753fbc9f19c1c24ac13/src/rulesets/LinearAlgebra/matfun.jl#L231:L274

But in answer to the OP's comment, that paper also addressed whether this naive algorithm is still good for computing the derivative (due to truncation error, it might not be), and they concluded it is, so right now, it is still the recommended approach for derivatives of the matrix exponential.

ExponentialAction is doing something different. If you need the derivative of exp(t*A)*B for B whose first dimension is much larger than the second, then that package has you covered.

gdalle commented 2 years ago

So what would you recommend to compute the Jacobian of A -> exp(A) using ForwardDiff? For lack of a better option I used ExponentialAction with B = diagm(ones(d)) (it doesn't work with B = I).

sethaxen commented 2 years ago

I would try using ForwardDiff with ExponentialUtilities.exponential! with method ExpMethodHigham2005Base first. If that doesn't work, there are other options. ExponentialAction will be really slow for computing the full matrix exponential. It tries hard not to do that.

gdalle commented 2 years ago

Thanks!

sethaxen commented 2 years ago

Coming back to this, here's an example with a benchmark:

julia> using ExponentialUtilities, ExponentialAction, ForwardDiff, LinearAlgebra

julia> myexp(A) = exponential!(copyto!(similar(A), A), ExpMethodGeneric());

julia> myexp2(A) = ExponentialAction.expv(1, A, I(size(A, 1)));

julia> A = randn(10, 10);

julia> A2 = randn(30, 30);

julia> @btime $(ForwardDiff.jacobian)($myexp, $A);
  568.646 μs (145 allocations: 559.45 KiB)

julia> @btime $(ForwardDiff.jacobian)($myexp2, $A);
  7.440 ms (2260 allocations: 15.74 MiB)

julia> @btime $(ForwardDiff.jacobian)($myexp, $A2);
  136.034 ms (1577 allocations: 39.82 MiB)

julia> @btime $(ForwardDiff.jacobian)($myexp2, $A2);
  1.907 s (32702 allocations: 888.83 MiB)

I didn't use ExpMethodHigham2005Base because its type constraints wouldn't work with ForwardDiff, and ExpMethodGeneric was faster than ExpMethodHigham2005. So we see a ~13x speed-up using the exponential instead of the action of the exponential for matrices of these sizes.