Open tpapp opened 7 years ago
I've not explored implementing AD for expm
, so I'm not sure how much help I can be, but it looks like you're on the right track. The ability to implement matrix derivatives in ForwardDiff will probably require #165.
It might be a better idea to start with ReverseDiff, whose implementation is currently more natural for implementing matrix derivatives. For example, many of the linear algebra derivative definitions in ReverseDiff stem from the Giles paper you linked.
@jrevels: Thanks. Looking at ReverseDiff
, the key to supporting a calculation again seems to be having it in pure Julia. So by providing an alternative implementation for expm
that does not use LAPACK, I could get both ReverseDiff
and ForwardDiff
to work, is that correct?
update: I confirm that if one removes balancing/rebalancing and replaces LAPACK.gesv!
by \
, you get a version of expm
that works fine with ForwardDiff
. Probably works with ReverseDiff
, but I need to figure that out.
Balancing is simply taking an input matrix A
and coming up with a scaling/permutation matrix D
such that A=DBD^{-1}
where B
is more balanced. The scaling would depend on A
continuously, while of course the permutation depends on A
discontinuously.
My hunch would be to shut down the derivatives for D
, ie compute it using the real part of A
, and then proceed with the rescaling.
Two questions:
map(x->x.value, matrix)
?ForwardDiff
always replace a matrix with AbstractMatrix{ForwardDiff.Dual}
, so it would be OK if I defined a method for expm
with that signature, or can it be something else?So by providing an alternative implementation for expm that does not use LAPACK, I could get both ReverseDiff and ForwardDiff to work, is that correct?
That is correct. If you have a generic, pure Julia implementation of expm
, ReverseDiff and ForwardDiff will hopefully both be able to differentiate it.
what's the recommended way of extracting the real part? is it map(x->x.value, matrix)?
map(ForwardDiff.value, matrix)
should work.
does ForwardDiff always replace a matrix with AbstractMatrix{ForwardDiff.Dual}, so it would be OK if I defined a method for expm with that signature, or can it be something else?
Currently, it's advisable to only overload scalar functions on Dual
numbers, not matrix functions on arrays of Dual
numbers. #165 is attempting to provide an interface that enables the latter - it can be pretty tricky to manage memory correctly because of ForwardDiff's chunk-mode. Thus, #165 has to be figured out before it will be possible to manually implement the Jacobian for expm
on Dual
arrays.
If you'd like to work on this in the meantime, it will be significantly easier to implement expm
in ReverseDiff
than in ForwardDiff
. Additionally, a ReverseDiff
implementation should be much faster for medium-to-large matrices (probably for about size(m, i) > 100
).
I might be confused about which thing you're trying to do here: are you seeking to implement a manual Jacobian for expm
that can be "plugged into" ForwardDiff/ReverseDiff, or are you seeking to implement a generic expm
such that ForwardDiff/ReverseDiff will be able to differentiate it?
My previous advice was assuming the former; for the latter, I believe it's sufficient just to implement a method of expm
for AbstractMatrix{T<:Real}
.
Hi, I am also interested in autodifferentiation with matrix exponentials. How can only replacing gebal!
and gesv!
give pure julia code? What about all the matrix multiplications?
For this purpose I have converted a pure Julia implementation to Julia 1.0, see here. It passes all the matrix exponential tests, but I have not yet tested it in autodiff. Would it be useful to have a pure Julia implementation in Base for this type of applications?
A PhD student in my group adapted the gist code provided by @sdewaele into this package where we use ForwardDiff to autodifferentiate matrix exponentials:
https://github.com/StevenWhitaker/BlochSim.jl/blob/main/src/expm.jl
However, the original code was not quite directly suitable for autodiff because it uses the frexp
function that is discontinuous (it is piecewise linear). This function arises due to the power of 2 scaling that is used for exp(Matrix)
. So he overloaded the derivative of it to provide the correct derivatives in the linear regions. (See end of the expm.jl file.)
Trying to autodifferentiate the code for exp(A) is analogous to trying to autodifferentiate the code for sin(x) instead of using the fact that the derivative of sin(x) is cos(x). Autodiff packages use cos(x) instead, knowing that cos(x) is computed approximately. ForwardDiff would benefit from a similar analytical form for the Jacobian of exp(A). A recent paper https://doi.org/10.1016/j.jedc.2021.104122 gives (in Thm. 2) a convenient expression for that Jacobian for the case where A
is not defective, which is probably the main use case. (Thm. 1 gives the more general case.)
Here is the remarkably simple Julia code:
using LinearAlgebra
"""
jacobian_expm(A)
Jacobian of non-defective (e.g., normal) matrix A
Method of Theorem 2 of Magnus et al. 2021 (magnus:21:tjo)
See https://doi.org/10.1016/j.jedc.2021.104122
"""
function jacobian_expm(A)
(d, T) = eigen(A)
Tinv = inv(T)
A ≈ T * Diagonal(d) * Tinv || throw("defective")
deltafun(a,b) = a == b ? exp(a) : (exp(a) - exp(b)) / (a - b)
delta = deltafun.(d, transpose(d))
S = kron(Tinv', T)
S = kron(transpose(Tinv), T)
return S * Diagonal(vec(delta)) * inv(S)
end
And here is test code that validates the analytical Jacobian versus the empirical finite difference perturbation (not sure if the validation is complete in the complex case BTW):
using Random: seed!
# empirical jacobian via finite differences
# todo: what about complex case? real perturbation seems incomplete...
function jacobian_empirical(A::Matrix{<:Number}; ep::Float64=1e-6)
n = size(A,1)
T = promote_type(eltype(A), Float64)
A = T.(A)
expA = exp(A)
Je = zeros(T, n^2, n^2)
for i=1:n, j=1:n
B = copy(A)
B[i,j] += ep
Je[:,(j-1)*n+i] = vec((exp(B) - expA) / ep)
end
return Je
end
n = 6
T = Float64
T = ComplexF64
for trial = 1:10^4
seed!(trial)
A = rand(T, n, n)
Ja = jacobian_expm(A) # analytical
if Ja ≈ real(Ja)
Ja = real(Ja)
end
Je = jacobian_empirical(A; ep=1e-6) #empirical
err = maximum(abs.(Je - Ja))
if err > 3e-6
err = round(err / 1e-5, digits=2)
@show trial, err, cond(A)
end
end
I have not had time yet to compare the run time of this approach vs the approach in expm.jl mentioned above. Would there be interest in adding the functionality to ForwardDiff.jl ?
Or does it belong somewhere more general like https://github.com/JuliaDiff/DiffRules.jl ?
For future reference, @sethaxen implemented an FD-compatible matrix exponential in https://github.com/sethaxen/ExponentialAction.jl
The paper linked in the OP gives the Frechet derivative as the same derivative you would get if you naively ran a forward mode AD on the same matrix exponential function implemented in Base. We've implemented this function in ChainRules. https://github.com/JuliaDiff/ChainRules.jl/blob/2cc27e2124fc7746afdd7753fbc9f19c1c24ac13/src/rulesets/LinearAlgebra/matfun.jl#L231:L274
But in answer to the OP's comment, that paper also addressed whether this naive algorithm is still good for computing the derivative (due to truncation error, it might not be), and they concluded it is, so right now, it is still the recommended approach for derivatives of the matrix exponential.
ExponentialAction is doing something different. If you need the derivative of exp(t*A)*B
for B
whose first dimension is much larger than the second, then that package has you covered.
So what would you recommend to compute the Jacobian of A -> exp(A)
using ForwardDiff? For lack of a better option I used ExponentialAction with B = diagm(ones(d))
(it doesn't work with B = I
).
I would try using ForwardDiff with ExponentialUtilities.exponential! with method ExpMethodHigham2005Base first. If that doesn't work, there are other options. ExponentialAction will be really slow for computing the full matrix exponential. It tries hard not to do that.
Thanks!
Coming back to this, here's an example with a benchmark:
julia> using ExponentialUtilities, ExponentialAction, ForwardDiff, LinearAlgebra
julia> myexp(A) = exponential!(copyto!(similar(A), A), ExpMethodGeneric());
julia> myexp2(A) = ExponentialAction.expv(1, A, I(size(A, 1)));
julia> A = randn(10, 10);
julia> A2 = randn(30, 30);
julia> @btime $(ForwardDiff.jacobian)($myexp, $A);
568.646 μs (145 allocations: 559.45 KiB)
julia> @btime $(ForwardDiff.jacobian)($myexp2, $A);
7.440 ms (2260 allocations: 15.74 MiB)
julia> @btime $(ForwardDiff.jacobian)($myexp, $A2);
136.034 ms (1577 allocations: 39.82 MiB)
julia> @btime $(ForwardDiff.jacobian)($myexp2, $A2);
1.907 s (32702 allocations: 888.83 MiB)
I didn't use ExpMethodHigham2005Base
because its type constraints wouldn't work with ForwardDiff, and ExpMethodGeneric
was faster than ExpMethodHigham2005
. So we see a ~13x speed-up using the exponential instead of the action of the exponential for matrices of these sizes.
It would be very nice to have support for AD in
Base.expm
. I would like to contribute to it, but since I am unfamiliar with the AD architecture in Julia I would appreciate some advice.For Frechet derivatives, the recommended approach is Al-Mohy and Higham (2009). But I am unsure that this is the best approach for AD.
I came across an article by Mike Giles on AD and matrix functions. Section 2.3.5 in this paper suggests that if one uses the squaring and scaling algorithm, AD calculations just amount to the solution of linear systems and matrix powers, both of which go through nicely.
Base.expm!
uses Julia functions, except for two LAPACK ones,gebal!
andgesv!
, for balancing and solving linear systems, respectively. I am under the impression that AD already goes through linear systems, so it could be replaced. So all one would need to do is implementgebal
in Julia, and replacegesv!
by\
, perhaps at the cost of some extra allocation.Am I on the right track with this?