Closed olivierverdier closed 1 week ago
Could you please quickly check which one seems to be wrong: log
or exp
?
reproduced on x86 linux. The issue appears to be in log
(although it's somewhat hard to prove because matrix logs are not unique).
A simpler example:
N = [0.0 -0.8 1.1 0.2 -0.2; 0.8 0.0 0.8 -0.4 -0.1; -1.1 -0.8 0.0 0.4 -0.1; -0.2 0.4 -0.4 0.0 0.0; 0.0 0.0 0.0 0.0 0.0]
k = 10 # also bad for k = 100 or 1000
sdiff = log(exp(N/k)) - N/k
@show -log10(maximum(abs.(sdiff))) # 2.9
I agree with @oscardssmith , I suspect the matrix logarithm to be at fault.
Edit: for completeness, the Python code
N = np.array([[0.0, -0.8, 1.1, 0.2, -0.2], [0.8, 0.0, 0.8, -0.4, -0.1], [-1.1, -0.8, 0.0, 0.4, -0.1], [-0.2, 0.4, -0.4, 0.0, 0.0], [0.0, 0.0, 0.0, 0.0, 0.0],])
k = 10
sdiff = sp.linalg.logm(sp.linalg.expm(N/k)) - N/k
print(-np.log10(np.max(np.abs(sdiff)))) # 15.0
This second example isn't as good because iiuc, log(exp(M)) == M
isn't expected for matrices do to branch cuts.
On v1.6.7 I get
julia> -log10(maximum(abs.(sdiff))) # < 3
15.175451669208671
on v1.9.4 and v1.8.5
julia> -log10(maximum(abs.(sdiff))) # < 3
2.233375639545474
so this must be an "old" bug.
This second example isn't as good because iiuc,
log(exp(M)) == M
isn't expected for matrices do to branch cuts.
That's why I included the parameter k
. If M
is small enough, I would expect log(exp(M)) == M
to be true no matter what, right?
so this must be an "old" bug.
I suspect the problem to be in log_quasitriu
, which was introduced in 6913f9ccb156230f54830c8b7b8ef82b41cac27e, in 2021.
I suspect the problem to be in log_quasitriu, which was introduced in https://github.com/JuliaLang/julia/commit/6913f9ccb156230f54830c8b7b8ef82b41cac27e, in 2021.
That's more recent than Julia v1.6 though
More evidence that the problem is the logarithm:
N = [0.0 -0.8 1.1 0.2 -0.2; 0.8 0.0 0.8 -0.4 -0.1; -1.1 -0.8 0.0 0.4 -0.1; -0.2 0.4 -0.4 0.0 0.0; 0.0 0.0 0.0 0.0 0.0]
k = 10
M = exp(N/k)
sdiff = log(M*M) - 2*log(M)
@show -log10(maximum(abs.(sdiff))) # 2.6
The equivalent code in Python gives 14.7.
I suspect the problem to be in log_quasitriu, which was introduced in 6913f9c, in 2021.
That's more recent than Julia v1.6 though
If it were the case that PR #39973 is to blame (wild guess), then the bug would have appeared first in v1.7.0.
For completeness, I checked the code above (i.e., with log(M*M) - 2*log(M)
) with various versions, and indeed, the bug appears between v1.6.7 and v1.7.3.
My money is still on PR #39973.
An even simpler test:
function test_log(N, k=10)
M = exp(N / k)
sdiff = log(M*M) - 2*log(M)
return -log10(maximum(abs.(sdiff)))
end
Now simply with
N = [0.0 -1.0 1.0; 1.0 0.0 0.0; 0.0 0.0 0.0]
run
test_log(N) # 2.1
and the same in Python returns 15.4.
It's not clear to me what is the cause. I tried a specific test from scipy's test suite, and this is the result:
julia> VERSION
v"1.10.4"
julia> A = [3.2346e-01 3.0000e+04 3.0000e+04 3.0000e+04;
0.0000e+00 3.0089e-01 3.0000e+04 3.0000e+04;
0.0000e+00 0.0000e+00 3.2210e-01 3.0000e+04;
0.0000e+00 0.0000e+00 0.0000e+00 3.0744e-01];
julia> logA = [ -1.12867982029050462e+00 9.61418377142025565e+04 -4.52485573953179264e+09 2.92496941103871812e+14;
0.00000000000000000e+00 -1.20101052953082288e+00 9.63469687211303099e+04 -4.68104828911105442e+09;
0.00000000000000000e+00 0.00000000000000000e+00 -1.13289322264498393e+00 9.53249183094775653e+04;
0.00000000000000000e+00 0.00000000000000000e+00 0.00000000000000000e+00 -1.17947533272554850e+00];
julia> (logA - log(A)) / norm(logA)
4×4 Matrix{Float64}:
0.0 5.97008e-25 9.78138e-21 -1.06839e-15
0.0 0.0 4.97507e-25 1.30418e-20
0.0 0.0 0.0 2.98504e-25
0.0 0.0 0.0 0.0
julia> (exp(logA) - A) / norm(A)
4×4 Matrix{Float64}:
-1.81534e-8 0.000297088 50.5783 -2.79971e6
0.0 2.33977e-8 0.00116877 8.26782
0.0 0.0 3.53834e-10 -0.00160443
0.0 0.0 0.0 -3.34387e-8
julia> (exp(log(A)) - A) / norm(A)
4×4 Matrix{Float64}:
-1.81534e-8 0.000297088 50.5783 -2.79971e6
0.0 2.33977e-8 0.00116877 8.26782
0.0 0.0 3.53834e-10 -0.00160443
0.0 0.0 0.0 -3.34387e-8
julia> (exp(log(A)) - A)
4×4 Matrix{Float64}:
-0.001334 21.8314 3.71673e6 -2.05736e11
0.0 0.00171937 85.8868 6.07558e5
0.0 0.0 2.60014e-5 -117.901
0.0 0.0 0.0 -0.00245723
OTOH, we have:
>>> (sp.linalg.expm(A_logm) - A) / sp.linalg.norm(A)
array([[ 0.00000000e+00, 1.33667877e-15, 3.49612787e-10,
-5.04994630e-07],
[ 0.00000000e+00, 0.00000000e+00, 1.23766552e-15,
4.78558722e-11],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
1.38618539e-15],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00]])
So, the computed log is really close to the "known" solution, but the exp of it is completely wrong in the upper right corner.
ADDENDUM: It seems, however, that this specific example fails on all versions starting from 1.6.x.
I tried a specific test from scipy's test suite
It's worth noting that the condition number of that matrix is awful:
julia> cond(A)
1.8884133342622014e20
julia> cond(logA)
4.086839610043594e28
I have an independent myexp
implementation (our actual exp
implementation is better but has limited type support #51008) and it similarly struggles with this:
function myexp(A)
# compute exp(A) == exp(A/2^n)^(2^n)
iszero(A) && return float(one(A))
# TODO: decide the best norm to use here: op-1, op-Inf, or frobenius are probably good candidates - maybe take the min of all three
nA = float(norm(A,2))
fr,ex = frexp(nA)
logscale = max(4 + ex,0) # ensure nA^(4+1)/factorial(4+1) < eps for 4 or whatever power our approximation is
p = ntuple(i->inv(prod(1.0:i-1)),Val(9))
scale = exp2(-logscale)
sA = A * scale
if sA isa AbstractMatrix
expA = evalpoly(sA,UniformScaling.(p))
else
expA = evalpoly(sA,p)
end
for _ in 1:logscale
expA = expA*expA
end
return expA
end
And with this and BigFloat
:
julia> norm(exp(logA) - A) / norm(A)
2.799714043033207e6
julia> norm(myexp(logA) - A) / norm(A) # myexp is generally worse
5.134964610902018e6
julia> norm(myexp(big.(logA)) - A) / norm(A) |> Float64 # despite being worse, BigFloat makes this version work
8.141481015780612e-7
Note that the squaring part of our exp
routine is probably very-much to blame, given that BigFloat
resolves that problem even though the exp kernel is not expanded beyond what is roughly needed for Float64
. I don't think we should use that example for this particular issue.
However, myexp
does nothing to resolve the original example:
julia> cond(M) # extremely well conditioned
1.1906431494194285
julia> norm(exp(log(M)) - M) / norm(M) |> Float64
0.003052024023511132
julia> norm(myexp(big.(log(M))) - M) / norm(M) |> Float64
0.0030520240235111305
Given that the matrix exponential is reasonably "simple" to compute (modulo stability issues, which even the literature has not fully settled), I am much more inclined towards expecting that log
is the culprit.
Confirmation that log
is the culprit:
function eigapply(fun, A)
D,V = eigen(A)
return V * Diagonal(fun.(D)) / V
end
julia> norm(exp(log(M)) - M) / norm(M)
0.003052024023511132
julia> norm(exp(eigapply(log,M)) - M) / norm(M)
6.499342713344915e-16
I think the problem is in log_quasitriu
. From the example above (i.e., qtriu = schur(exp(N/10)).T
):
qtriu = [0.9950041652780259 -0.09983341664682815 0.07153667745275474; 0.09983341664682804 0.9950041652780259 0.06981527929449967; 0.0 0.0 1.0]
Now
LinearAlgebra.log_quasitriu(qtriu)[:,end]
gives
0.07240564253650691
0.06891743457599916
0.0
and in Python, the same vector is
0.07496781758158656
0.06618025632357401
0.0
Pretty big discrepancy starting at the second decimal.
So this code is a very direct matlab translation with all the oddness that entails, but what I've found is that if we make it so s,m = 1,4
rather than 0,5
we get the right answer. see https://github.com/JuliaLang/julia/pull/54867
Just one more pointer to add here for further analysis. Kinda good that the differences are localised.
julia> using LinearAlgebra
julia> M = [0.9949357359852791 -0.015567763143324862 -0.09091193493947397 -0.03994428739762443 0.07338356301650806; 0.011813655598647289 0.9968988574699793 -0.06204555000202496 0.04694097614450692 0.09028834462782365; 0.092737943594701 0.059546719185135925 0.9935850721633324 0.025348893985651405 -0.018530261590167685; 0.0369187299165628 -0.04903571106913449 -0.025962938675946543 0.9977767446862031 0.12901494726320517; 0.0 0.0 0.0 0.0 1.0]
5×5 Matrix{Float64}:
0.994936 -0.0155678 -0.0909119 -0.0399443 0.0733836
0.0118137 0.996899 -0.0620456 0.046941 0.0902883
0.0927379 0.0595467 0.993585 0.0253489 -0.0185303
0.0369187 -0.0490357 -0.0259629 0.997777 0.129015
0.0 0.0 0.0 0.0 1.0
julia> S = schur(complex(M));
julia> norm(exp(S.Z*LinearAlgebra.log_quasitriu(S.T)*S.Z') - M)
1.5337045591657124e-15
julia> S = schur(M);
julia> norm(exp(S.Z*LinearAlgebra.log_quasitriu(S.T)*S.Z') - M)
0.0068453336198366545
julia> exp(S.Z*LinearAlgebra.log_quasitriu(S.T)*S.Z') - M
5×5 Matrix{Float64}:
-6.66134e-16 -3.72966e-16 2.498e-16 -1.38778e-17 -0.00185147
-3.34802e-16 5.55112e-16 -2.15106e-16 4.85723e-17 0.00300777
-6.93889e-17 -2.70617e-16 -2.22045e-16 -2.08167e-17 0.00584284
-1.249e-16 -6.245e-17 -7.63278e-17 0.0 -0.000495109
0.0 0.0 0.0 0.0 0.0
Eigenvectors and eigenvalues of M
and exp(log(M))
:
julia> using LinearAlgebra
julia> M = [0.9949357359852791 -0.015567763143324862 -0.09091193493947397 -0.03994428739762443 0.07338356301650806; 0.011813655598647289 0.9968988574699793 -0.06204555000202496 0.04694097614450692 0.09028834462782365; 0.092737943594701 0.059546719185135925 0.9935850721633324 0.025348893985651405 -0.018530261590167685; 0.0369187299165628 -0.04903571106913449 -0.025962938675946543 0.9977767446862031 0.12901494726320517; 0.0 0.0 0.0 0.0 1.0]
5×5 Matrix{Float64}:
0.994936 -0.0155678 -0.0909119 -0.0399443 0.0733836
0.0118137 0.996899 -0.0620456 0.046941 0.0902883
0.0927379 0.0595467 0.993585 0.0253489 -0.0185303
0.0369187 -0.0490357 -0.0259629 0.997777 0.129015
0.0 0.0 0.0 0.0 1.0
julia> E1 = eigen(M);
julia> E2 = eigen(exp(log(M)));
julia> norm(E1.values - E2.values)
1.2787159664228656e-15
julia> norm(E1.vectors - E2.vectors)
0.028242416081603217
julia> abs.(E1.vectors - E2.vectors)
5×5 Matrix{Float64}:
3.33356e-16 3.33356e-16 4.54082e-15 4.54082e-15 0.0120785
1.34378e-15 1.34378e-15 4.87741e-15 4.87741e-15 0.0141492
6.66134e-16 6.66134e-16 4.26807e-15 4.26807e-15 0.00359654
3.34869e-16 3.34869e-16 0.0 0.0 0.0209428
0.0 0.0 0.0 0.0 9.55047e-5
The likely "correct" answer based on Schur-Parlett recurrence algorithm (unique roots required):
julia> S = schur(M);
julia> norm(exp(S.Z*real(MatrixFunctions.fm_schur_parlett_recurrence(log, S.T))*S.Z') - M)
1.1184124283538868e-15
julia> norm(exp(S.Z*LinearAlgebra.log_quasitriu(S.T)*S.Z') - M)
0.0068453336198366545
julia> real(MatrixFunctions.fm_schur_parlett_recurrence(log, S.T))
5×5 Matrix{Float64}:
-1.09981e-15 0.117707 4.04932e-15 2.60264e-15 0.141422
-0.117707 -1.09981e-15 1.77951e-15 3.03452e-16 0.0482574
0.0 0.0 2.99457e-16 0.0544547 -0.0213389
0.0 0.0 -0.0544547 2.99457e-16 0.0881419
0.0 0.0 0.0 0.0 0.0
julia> LinearAlgebra.log_quasitriu(S.T)
5×5 Matrix{Float64}:
-1.10068e-15 0.117707 4.06711e-15 2.69458e-15 0.143336
-0.117707 -1.10068e-15 1.59252e-15 2.22818e-16 0.0419473
0.0 0.0 2.9989e-16 0.0544547 -0.0195324
0.0 0.0 -0.0544547 2.9989e-16 0.0885489
0.0 0.0 0.0 0.0 0.0
Now, looking a bit more into the code, I'm confused.
The function _log_quasitriu!
https://github.com/JuliaLang/julia/blob/b6d2155e65bbbf511a78aadf8e21fcec51e525c4/stdlib/LinearAlgebra/src/triangular.jl#L1947 utilizes https://github.com/JuliaLang/julia/blob/b6d2155e65bbbf511a78aadf8e21fcec51e525c4/stdlib/LinearAlgebra/src/triangular.jl#L1949 to compute the Pade approximation parameters m
and s
. However, the function https://github.com/JuliaLang/julia/blob/b6d2155e65bbbf511a78aadf8e21fcec51e525c4/stdlib/LinearAlgebra/src/triangular.jl#L1998 is based on [R1], which is only for triangular matrices and not quasi-triangular matrices. Nevertheless, it is applied to both triangular and quasi-triangular matrices. Furthermore, the suggested algorithm [16, Alg. 6.3] in [R1] for T^(1/2) is replaced with _sqrt_quasitriu!
in [R2] (according to the function header).
Question:
s
.Note: I've taken note of @oscardssmith's fix where using s=1, m=5
results in a good solution instead of s=0, m=6
obtained from _find_params_log_quasitriu!
. However, I'm not sure if forcing a square root in every case is the right approach.
[R1]: Al-Mohy, Awad H. and Higham, Nicholas J. "Improved Inverse Scaling and Squaring Algorithms for the Matrix Logarithm", 2011, url: https://eprints.maths.manchester.ac.uk/1687/1/paper11.pdf [R2]: Deadman, Edvin and Higham, Nicholas J. and Ralha, Rui, "Blocked Schur Algorithms for Computing the Matrix Square Root," 2012, url: https://eprints.maths.manchester.ac.uk/1926/1/EdvinDeadmanPARA2012.pdf
- Has anyone studied whether the algorithm in [R1] works with quasi-triangular matrices?
This does not seem to be the immediate problem. However, using the diagonal elements in the following seems wrong. https://github.com/JuliaLang/julia/blob/34b81fbc90a96c1db7f235a465d2cfdf5937e563/stdlib/LinearAlgebra/src/triangular.jl#L2030
Choosing d
as the eigenvalues of A
seems to be a better choice for quasi-triangular matrices. The rest of the algorithm seems Ok for quasi-triangular matrices.
Note: Nevertheless, for the given M
in this issue, using eigenvalues instead of diagonal values does not cause a change in the computed parameters m = 6
and s = 0
. This will need to be taken up separately.
Next stop: double-checking the Pade approximation for m=6, s=0
as computed by the current implementation of _find_params_log_quasitriu!
.
The issue is in the function https://github.com/JuliaLang/julia/blob/34b81fbc90a96c1db7f235a465d2cfdf5937e563/stdlib/LinearAlgebra/src/triangular.jl#L2169 where the case $s = 0$ is not handled. In fact, the function faithfully implements Algorithm 5.1 in [R1]. However, unfortunately, Algorithm 5.1 in [R1] seems incomplete and only applicable for $s \geq 1.$
The following is a quick fix that works for this case (changes are made to the code in the above commit). I will have to think a bit more before suggesting a permanent solution.
Base.@propagate_inbounds function _sqrt_pow_diag_block_2x2!(A, A0, s)
+ if iszero(s)
+ A[1,1] -= 1
+ A[2,2] -= 1
+ return
+ end
_sqrt_real_2x2!(A, A0)
if isone(s)
A[1,1] -= 1
A[2,2] -= 1
else
With this change, we have
julia> norm(exp(S.Z*log_quasitriu(S.T)*S.Z') - M)
1.1192892392171394e-15
This code looks good also for long term. @oscardssmith would you like to update your PR's title/code and provide this fix instead? To avoid duplication of effort and the fact that you got started with this...
Update 24-Oct-2024: I will create a PR with this fix.
versioninfo()
With
juliaup
The same test in Python gives