Closed ztultrebor closed 5 years ago
This works:
mylog(X) = let S = schur(complex(X)); S.Z * log(UpperTriangular(S.T)) * S.Z'; end
so the problem seems to come from the combination of two Schur factorizations here?
It looks like the schur(complex(schur(C).T)).T
that is computed here has a certain "unlucky" structure that is handled very inaccurately:
julia> U = UpperTriangular(schur(complex(schur(C).T)).T)
2×2 UpperTriangular{Complex{Float64},Array{Complex{Float64},2}}:
0.0+10.0im -64.3428-18.9738im
⋅ -2.44249e-15-10.0im
julia> exp(Matrix(log(U))) - U
2×2 Array{Complex{Float64},2}:
1.77636e-15-3.55271e-15im 128.686+37.9475im
0.0+0.0im -1.9984e-15-1.77636e-15im
Here is a simpler example:
julia> log(UpperTriangular([10.0im 0; 0 -10.0im]))
2×2 UpperTriangular{Complex{Float64},Array{Complex{Float64},2}}:
2.30259+1.5708im NaN+NaN*im
⋅ 2.30259-1.5708im
Not sure where these NaNs are coming from, but I'm guessing that's related…
I've verified that the original code at http://eprints.ma.man.ac.uk/1851/02/logm.zip does not seem to have this problem (the result for logm_new(C)
seems accurate).
However, it does have some division-by-zero warnings:
octave:10> U = [10.0i 0; 0 -10.0i]
U =
0 + 10i 0 + 0i
0 + 0i -0 - 10i
octave:11> logm_new(U)
warning: division by zero
warning: called from
logm_new>powerm2by2 at line 179 column 6
logm_new at line 87 column 20
warning: division by zero
warning: called from
logm_new>logm2by2 at line 228 column 7
logm_new at line 100 column 20
ans =
2.30259 + 1.57080i 0.00000 + 0.00000i
0.00000 + 0.00000i 2.30259 - 1.57080i
cc @iamnapo, since this code is from #21571
The NaN
in my log(UpperTriangular([10.0im 0; 0 -10.0im]))
example above seems to be coming from this line.
In particular, Akp1 == -10im
and Ak == 10im
, so (Akp1 - Ak) / (Akp1 + Ak)
gives NaN + NaN*im
Is there a reason to do the first Schur factorization as real?
@ztultrebor, it's just a performance optimization, I think, since a real Schur factorization is faster than the complex one (but may only be quasi-upper-triangular, leading to the second Schur factorization for cases with complex eigenvalues). In any case, it looks like the underlying problem comes elsewhere, from the fact that log(::UpperTriangular)
is inaccurate in some cases.
@stevengj There's definitely a problem (or two) with the log(::UpperTriangular)
method.
While the improved mylog
function works for a small random matrix:
julia> mylog(X) = let S = schur(complex(X)); S.Z * log(UpperTriangular(S.T)) * S.Z'; end
mylog (generic function with 1 method)
julia> D2 = randn(2,2)
2×2 Array{Float64,2}:
-0.0532145 1.41135
-0.03215 1.4111
julia> exp(mylog(D2)) - D2
2×2 Array{Complex{Float64},2}:
-2.77556e-17+1.56125e-17im -6.66134e-16+1.38778e-16im
-6.93889e-18+6.07153e-18im -2.22045e-16-1.04083e-17im
it still fails for a slightly larger random matrix:
julia> D6 = randn(6,6)
6×6 Array{Float64,2}:
0.90881 0.336947 0.843843 0.918692 -1.05808 -0.179687
1.49618 0.375474 -0.437477 1.98801 -1.0208 -0.381434
1.09977 -0.595175 -0.718849 0.597049 -1.90258 0.0568476
-1.6915 -0.601828 0.52197 -0.692272 -0.473441 1.38306
-1.27279 1.69424 0.24387 0.498871 -2.3394 0.286934
0.457298 -0.0873094 -0.199842 0.851316 -0.371774 -0.050287
julia> exp(mylog(D6)) - D6
6×6 Array{Complex{Float64},2}:
-0.0373332-0.0413267im 0.00640153+0.00270801im … 0.0674095-0.0659298im 0.0314105+0.0050294im
0.0733027-0.171132im -0.110669-0.229787im -0.015317+0.265689im -0.0314504+0.158948im
0.0851588-0.915255im -0.407165-0.90443im 0.31459+0.695568im 0.0492944+0.664101im
-0.070744-0.283477im -0.06765-0.17742im 0.22292-0.0141363im 0.0841001+0.146827im
-0.0921727-0.97041im -0.321872-0.765978im 0.5693+0.306179im 0.181584+0.593536im
0.0366271-0.057668im -0.0444716-0.0900447im … -0.02057+0.117721im -0.0190502+0.0607899im
The Schur factorization is fine:
julia> D6schur = schur(D6);
julia> D6schur.Z*D6schur.T*D6schur.Z' - D6
6×6 Array{Float64,2}:
2.77556e-15 5.55112e-17 0.0 1.22125e-15 -1.33227e-15 6.93889e-16
1.11022e-15 9.99201e-16 -6.10623e-16 -4.44089e-16 0.0 0.0
-2.22045e-16 5.55112e-16 -1.44329e-15 -2.88658e-15 -3.55271e-15 4.02456e-16
-1.77636e-15 3.33067e-16 -6.66134e-16 5.55112e-16 -3.4972e-15 6.66134e-16
-2.44249e-15 6.66134e-16 -2.41474e-15 -7.21645e-16 -7.54952e-15 -1.11022e-16
-9.4369e-16 4.57967e-16 -2.77556e-17 -1.11022e-15 1.66533e-16 4.51028e-16
Also, this is broken
julia> G = [1 1; 0 1]
2×2 Array{Int64,2}:
1 1
0 1
julia> log(G)
ERROR: MethodError: no method matching log(::UpperTriangular{Complex{Int64},Array{Complex{Int64},2}})
Closest candidates are:
log(::Float16) at math.jl:1018
log(::Complex{Float16}) at math.jl:1019
log(::Float64) at special/log.jl:254
...
The Compute accurate superdiagonal of T
code that is giving NaN in my example is from Higham's Functions of Matrices, chapter 11:
In particular, note the comment about it depending on the definition of the atanh
function — since our code was ported from from Matlab code, we have to be careful that the atanh
definition is the same.
I found a typo in the code that seems to fix the problem. I also think we should update to use (11.27) from Higham's book since our atanh
follows the IEEE 754 definition. Will have a PR soon, I hope.
I also get
julia> norm(exp(log(D6)) - D6) / norm(D6)
1.9415387506094363e-15
for your bigger example above.
(There has got to be a better way to compute log(x/y) / (x-y)
than this crazy complicated formula in Higham's book.)
@ztultrebor, the issue will be closed when the bugfix PR is merged.
Higham's formulae seem to rely on arcane features of Matlab floating point operations, which differ from the ISO C99ff standards followed (faithfully, AFAICT) by Julia. I propose a different sign based on analysis of the ISO-style branch cut:
# this is the fragile thing we are trying to represent
function f_baseline(a,b)
log(b) - log(a)
end
# this is Higham's version (bottom of p.279):
function f_higham1(a,b)
z = (b-a)/(b+a)
log((1.0 + z)/(1.0 - z)) + 2.0*im*pi*(unw(log(b)-log(a)))
end
# this is the form in 11.27
function f_higham2(a,b)
z = (b-a)/(b+a)
u = unw(log1p(z) - log1p(-z))
2.0 * (atanh(z) + im*pi*(unw(log(b)-log(a)) + u))
end
# this is the form in 11.27, with changed sign
function f_proposed(a,b)
z = (b-a)/(b+a)
u = -unw(log1p(z) - log1p(-z))
2.0 * (atanh(z) + im*pi*(unw(log(b)-log(a)) + u))
end
function atest(n=1024)
a1 = 10.0 * (rand(n) .- 0.5) .+ 0im
a2 = 10.0 * (rand(n) .- 0.5) .+ 0im
y0 = f_baseline.(a1,a2)
y1 = f_higham1.(a1,a2)
y2 = f_higham2.(a1,a2)
y3 = f_proposed.(a1,a2)
println("Higham 1: ",norm(y1-y0))
println("Higham 2: ",norm(y2-y0))
println("Proposed: ",norm(y3-y0))
end
Typical results:
julia-1.1> atest()
Higham 1: 99.74247458479297
Higham 2: 201.06192982974676
Proposed: 7.368633527567163e-12
This is because Higham's formulae (in Julia) sometimes pick strange sheets of the Riemann surface:
┌─────────────┬───────────────┬────────────────┬────────────────┬───────────────┐
│ z │ baseline │ higham1 │ higham2 │ proposed │
├─────────────┼───────────────┼────────────────┼────────────────┼───────────────┤
│ -4.19+0im │ -0.486+3.14im │ -0.486+3.14im │ -0.486+3.14im │ -0.486+3.14im │
│ -122-0im │ -0.0163-3.14im │ -0.0163-3.14im │ -0.0163-15.7im │ -0.0163-3.14im │
│ 0.236-0im │ 0.48+0im │ 0.48-0im │ 0.48-0im │ 0.48+0im │
│ 0.594-0im │ 1.37+0im │ 1.37-0im │ 1.37-0im │ 1.37+0im │
│ 1.35-0im │ 1.9+3.14im │ 1.9-3.14im │ 1.9-9.42im │ 1.9+3.14im │
│ -3.28+0im │ -0.629+3.14im │ -0.629+3.14im │ -0.629+3.14im │ -0.629+3.14im │
│ 2.25+0im │ 0.955-3.14im │ 0.955-9.42im │ 0.955-3.14im │ 0.955-3.14im │
│ -40.9-0im │ -0.0489-3.14im │ -0.0489-3.14im │ -0.0489-15.7im │ -0.0489-3.14im │
│ 4.59+0im │ 0.443-3.14im │ 0.443-9.42im │ 0.443-3.14im │ 0.443-3.14im │
│ 0.334+0im │ 0.694+0im │ 0.694+0im │ 0.694+0im │ 0.694+0im │
│ -0.885-0im │ -2.8+0im │ -2.8-0im │ -2.8-0im │ -2.8+0im │
│ -0.522-0im │ -1.16+0im │ -1.16-0im │ -1.16-0im │ -1.16+0im │
│ 49.8+0im │ 0.0401-3.14im │ 0.0401-9.42im │ 0.0401-3.14im │ 0.0401-3.14im │
│ 1.94-0im │ 1.14+3.14im │ 1.14-3.14im │ 1.14-9.42im │ 1.14+3.14im │
│ 1.34+0im │ 1.94-3.14im │ 1.94-9.42im │ 1.94-3.14im │ 1.94-3.14im │
│ 13+0im │ 0.154-3.14im │ 0.154-9.42im │ 0.154-3.14im │ 0.154-3.14im │
└────────────┴────────────────┴────────────────┴────────────────┴────────────────┘
Arguments like the above with signed zeros are important for the matrix functions. Those extra multiples of 2 * pi * im
put the terms out of the range of validity of the Pade forms, at least for the logarithm.
@stevengj I tried the formulation in your PR with the above sign change and see no trouble (for matrices which should be well-conditioned for log
).
This took me a second, but it's key to reading the table above that the "baseline" and "proposed" columns always match.
@RalphAS are there values of z
for which f_baseline
gives the wrong answer?
What about this?
function matrix_log_taylor(A, i=256)
B0 = A - I
R = copy(B0)
B = copy(B0)
for n=2:i
B *= B0
R += (-1)^(n+1)*B/n
end
return R
end
function logmat(A)
Λ, S = eigen(A)
if length(Set(Λ))==length(Λ)
# no repeated eigenvalues
return S*(log.(complex(Λ)).*inv(S))
else
# repeated eigenvalues
n = 0
while norm(A-I)>1
A = sqrt(A)
n += 1
end
return 2^n*matrix_log_taylor(A)
end
end
See here p.33. This approach seems to be more correct and faster than the default method.
Refering to the example of the beginning of this issue:
C = [30 20; -50 -30]
log(exp(C))
seems to work now fine, but
exp(log(C))
shows a rather strange behavior:
2×2 Matrix{Float64}: -7.69911 -5.13274 12.8319 7.69911
@dorn-gerhard, I think you mean that exp(log(C))
works as expected log(exp(C))
gives the surprising result you quoted?
julia> C = [30 20; -50 -30];
julia> log(exp(C))
2×2 Matrix{Float64}:
-7.69911 -5.13274
12.8319 7.69911
This is just because log
is multi-valued — the result above is correct:
julia> exp(log(exp(C))) ≈ exp(C)
true
In particular, if you look at A = C - log(exp(C))
, it is a matrix whose exponential is the identity:
julia> A = C - log(exp(C))
2×2 Matrix{Float64}:
37.6991 25.1327
-62.8319 -37.6991
julia> exp(A)
2×2 Matrix{Float64}:
1.0 3.55271e-14
-8.52651e-14 1.0
because the eigenvalues of A
are ±2πi:
julia> eigvals(A) / 2pi
2-element Vector{ComplexF64}:
-2.297067269945873e-16 - 1.9999999999999998im
-2.297067269945873e-16 + 1.9999999999999998im
Did we ever incorporate @RalphAS's improvement above? Looks like it: https://github.com/JuliaLang/julia/blob/635449dabee81bba315ab066627a98f856141969/stdlib/LinearAlgebra/src/triangular.jl#L2110
A similar issue was discussed in #21179. That issue has been fixed, as the example given calculates correctly
But if we let
C=10 B
then we encounter a problemThis approach gives the correct answer