JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.47k stars 5.46k forks source link

log(matrix) issue #32313

Closed ztultrebor closed 5 years ago

ztultrebor commented 5 years ago

A similar issue was discussed in #21179. That issue has been fixed, as the example given calculates correctly

julia> B = [3 2; -5 -3]
2×2 Array{Int64,2}:
  3   2
 -5  -3

julia> exp(log(B))
2×2 Array{Complex{Float64},2}:
  3.0+3.9276e-16im    2.0+1.30473e-16im
 -5.0-1.26787e-15im  -3.0-4.20143e-16im

But if we let C=10 B then we encounter a problem

julia> C = [30 20; -50 -30]
2×2 Array{Int64,2}:
  30   20
 -50  -30

julia> exp(log(C))
2×2 Array{Complex{Float64},2}:
   -30.0-8.57143im  -14.2857-17.1429im
 44.2857-17.1429im      30.0+8.57143im

This approach gives the correct answer

julia> using LinearAlgebra

julia> function logmat(A)
       Λ, S = eigen(A)
       return S*(log.(Complex.(Λ)).*inv(S))
   end
logmat (generic function with 1 method)

julia> exp(logmat(C))
2×2 Array{Complex{Float64},2}:
  30.0+0.0im   20.0+0.0im
 -50.0+0.0im  -30.0+0.0im
stevengj commented 5 years ago

This works:

mylog(X) = let S = schur(complex(X)); S.Z * log(UpperTriangular(S.T)) * S.Z'; end

so the problem seems to come from the combination of two Schur factorizations here?

stevengj commented 5 years ago

It looks like the schur(complex(schur(C).T)).T that is computed here has a certain "unlucky" structure that is handled very inaccurately:

julia> U = UpperTriangular(schur(complex(schur(C).T)).T)
2×2 UpperTriangular{Complex{Float64},Array{Complex{Float64},2}}:
 0.0+10.0im      -64.3428-18.9738im
     ⋅       -2.44249e-15-10.0im   

julia> exp(Matrix(log(U))) - U
2×2 Array{Complex{Float64},2}:
 1.77636e-15-3.55271e-15im      128.686+37.9475im    
         0.0+0.0im          -1.9984e-15-1.77636e-15im

Here is a simpler example:

julia> log(UpperTriangular([10.0im 0; 0 -10.0im]))
2×2 UpperTriangular{Complex{Float64},Array{Complex{Float64},2}}:
 2.30259+1.5708im      NaN+NaN*im  
         ⋅         2.30259-1.5708im

Not sure where these NaNs are coming from, but I'm guessing that's related…

stevengj commented 5 years ago

I've verified that the original code at http://eprints.ma.man.ac.uk/1851/02/logm.zip does not seem to have this problem (the result for logm_new(C) seems accurate).

However, it does have some division-by-zero warnings:

octave:10> U = [10.0i 0; 0 -10.0i]
U =
    0 + 10i    0 +  0i
    0 +  0i   -0 - 10i

octave:11> logm_new(U)
warning: division by zero
warning: called from
    logm_new>powerm2by2 at line 179 column 6
    logm_new at line 87 column 20
warning: division by zero
warning: called from
    logm_new>logm2by2 at line 228 column 7
    logm_new at line 100 column 20
ans =
   2.30259 + 1.57080i   0.00000 + 0.00000i
   0.00000 + 0.00000i   2.30259 - 1.57080i
stevengj commented 5 years ago

cc @iamnapo, since this code is from #21571

stevengj commented 5 years ago

The NaN in my log(UpperTriangular([10.0im 0; 0 -10.0im])) example above seems to be coming from this line.

In particular, Akp1 == -10im and Ak == 10im, so (Akp1 - Ak) / (Akp1 + Ak) gives NaN + NaN*im

ztultrebor commented 5 years ago

Is there a reason to do the first Schur factorization as real?

stevengj commented 5 years ago

@ztultrebor, it's just a performance optimization, I think, since a real Schur factorization is faster than the complex one (but may only be quasi-upper-triangular, leading to the second Schur factorization for cases with complex eigenvalues). In any case, it looks like the underlying problem comes elsewhere, from the fact that log(::UpperTriangular) is inaccurate in some cases.

ztultrebor commented 5 years ago

@stevengj There's definitely a problem (or two) with the log(::UpperTriangular) method.

While the improved mylog function works for a small random matrix:

julia> mylog(X) = let S = schur(complex(X)); S.Z * log(UpperTriangular(S.T)) * S.Z'; end
mylog (generic function with 1 method)

julia> D2 = randn(2,2)
2×2 Array{Float64,2}:
 -0.0532145  1.41135
 -0.03215    1.4111

julia> exp(mylog(D2)) - D2
2×2 Array{Complex{Float64},2}:
 -2.77556e-17+1.56125e-17im  -6.66134e-16+1.38778e-16im
 -6.93889e-18+6.07153e-18im  -2.22045e-16-1.04083e-17im

it still fails for a slightly larger random matrix:

julia> D6 = randn(6,6)
6×6 Array{Float64,2}:
  0.90881    0.336947    0.843843   0.918692  -1.05808   -0.179687
  1.49618    0.375474   -0.437477   1.98801   -1.0208    -0.381434
  1.09977   -0.595175   -0.718849   0.597049  -1.90258    0.0568476
 -1.6915    -0.601828    0.52197   -0.692272  -0.473441   1.38306
 -1.27279    1.69424     0.24387    0.498871  -2.3394     0.286934
  0.457298  -0.0873094  -0.199842   0.851316  -0.371774  -0.050287

julia> exp(mylog(D6)) - D6
6×6 Array{Complex{Float64},2}:
 -0.0373332-0.0413267im  0.00640153+0.00270801im  …  0.0674095-0.0659298im   0.0314105+0.0050294im
  0.0733027-0.171132im    -0.110669-0.229787im       -0.015317+0.265689im   -0.0314504+0.158948im
  0.0851588-0.915255im    -0.407165-0.90443im          0.31459+0.695568im    0.0492944+0.664101im
  -0.070744-0.283477im     -0.06765-0.17742im          0.22292-0.0141363im   0.0841001+0.146827im
 -0.0921727-0.97041im     -0.321872-0.765978im          0.5693+0.306179im     0.181584+0.593536im
  0.0366271-0.057668im   -0.0444716-0.0900447im   …   -0.02057+0.117721im   -0.0190502+0.0607899im

The Schur factorization is fine:

julia> D6schur = schur(D6);

julia> D6schur.Z*D6schur.T*D6schur.Z' - D6
6×6 Array{Float64,2}:
  2.77556e-15  5.55112e-17   0.0           1.22125e-15  -1.33227e-15   6.93889e-16
  1.11022e-15  9.99201e-16  -6.10623e-16  -4.44089e-16   0.0           0.0
 -2.22045e-16  5.55112e-16  -1.44329e-15  -2.88658e-15  -3.55271e-15   4.02456e-16
 -1.77636e-15  3.33067e-16  -6.66134e-16   5.55112e-16  -3.4972e-15    6.66134e-16
 -2.44249e-15  6.66134e-16  -2.41474e-15  -7.21645e-16  -7.54952e-15  -1.11022e-16
 -9.4369e-16   4.57967e-16  -2.77556e-17  -1.11022e-15   1.66533e-16   4.51028e-16
ztultrebor commented 5 years ago

Also, this is broken

julia> G = [1 1; 0 1]
2×2 Array{Int64,2}:
 1  1
 0  1

julia> log(G)
ERROR: MethodError: no method matching log(::UpperTriangular{Complex{Int64},Array{Complex{Int64},2}})
Closest candidates are:
  log(::Float16) at math.jl:1018
  log(::Complex{Float16}) at math.jl:1019
  log(::Float64) at special/log.jl:254
  ...
stevengj commented 5 years ago

The Compute accurate superdiagonal of T code that is giving NaN in my example is from Higham's Functions of Matrices, chapter 11: image image

In particular, note the comment about it depending on the definition of the atanh function — since our code was ported from from Matlab code, we have to be careful that the atanh definition is the same.

stevengj commented 5 years ago

I found a typo in the code that seems to fix the problem. I also think we should update to use (11.27) from Higham's book since our atanh follows the IEEE 754 definition. Will have a PR soon, I hope.

I also get

julia> norm(exp(log(D6)) - D6) / norm(D6)
1.9415387506094363e-15

for your bigger example above.

(There has got to be a better way to compute log(x/y) / (x-y) than this crazy complicated formula in Higham's book.)

stevengj commented 5 years ago

@ztultrebor, the issue will be closed when the bugfix PR is merged.

RalphAS commented 5 years ago

Higham's formulae seem to rely on arcane features of Matlab floating point operations, which differ from the ISO C99ff standards followed (faithfully, AFAICT) by Julia. I propose a different sign based on analysis of the ISO-style branch cut:

# this is the fragile thing we are trying to represent
function f_baseline(a,b)
    log(b) - log(a)
end
# this is Higham's version (bottom of p.279):
function f_higham1(a,b)
    z = (b-a)/(b+a)
    log((1.0 + z)/(1.0 - z)) + 2.0*im*pi*(unw(log(b)-log(a)))
end
# this is the form in 11.27
function f_higham2(a,b)
    z = (b-a)/(b+a)
    u = unw(log1p(z) - log1p(-z))
    2.0 * (atanh(z) + im*pi*(unw(log(b)-log(a)) + u))
end

# this is the form in 11.27, with changed sign
function f_proposed(a,b)
    z = (b-a)/(b+a)
    u = -unw(log1p(z) - log1p(-z))
    2.0 * (atanh(z) + im*pi*(unw(log(b)-log(a)) + u))
end
function atest(n=1024)
    a1 = 10.0 * (rand(n) .- 0.5) .+ 0im
    a2 = 10.0 * (rand(n) .- 0.5) .+ 0im
    y0 = f_baseline.(a1,a2)
    y1 = f_higham1.(a1,a2)
    y2 = f_higham2.(a1,a2)
    y3 = f_proposed.(a1,a2)
    println("Higham 1: ",norm(y1-y0))
    println("Higham 2: ",norm(y2-y0))
    println("Proposed: ",norm(y3-y0))
end

Typical results:

julia-1.1> atest()
Higham 1: 99.74247458479297
Higham 2: 201.06192982974676
Proposed: 7.368633527567163e-12

This is because Higham's formulae (in Julia) sometimes pick strange sheets of the Riemann surface:

┌─────────────┬───────────────┬────────────────┬────────────────┬───────────────┐
│           z │      baseline │       higham1  │       higham2  │     proposed  │
├─────────────┼───────────────┼────────────────┼────────────────┼───────────────┤
│  -4.19+0im │  -0.486+3.14im │  -0.486+3.14im │  -0.486+3.14im │  -0.486+3.14im │
│   -122-0im │ -0.0163-3.14im │ -0.0163-3.14im │ -0.0163-15.7im │ -0.0163-3.14im │
│  0.236-0im │       0.48+0im │       0.48-0im │       0.48-0im │       0.48+0im │
│  0.594-0im │       1.37+0im │       1.37-0im │       1.37-0im │       1.37+0im │
│   1.35-0im │     1.9+3.14im │     1.9-3.14im │     1.9-9.42im │     1.9+3.14im │
│  -3.28+0im │  -0.629+3.14im │  -0.629+3.14im │  -0.629+3.14im │  -0.629+3.14im │
│   2.25+0im │   0.955-3.14im │   0.955-9.42im │   0.955-3.14im │   0.955-3.14im │
│  -40.9-0im │ -0.0489-3.14im │ -0.0489-3.14im │ -0.0489-15.7im │ -0.0489-3.14im │
│   4.59+0im │   0.443-3.14im │   0.443-9.42im │   0.443-3.14im │   0.443-3.14im │
│  0.334+0im │      0.694+0im │      0.694+0im │      0.694+0im │      0.694+0im │
│ -0.885-0im │       -2.8+0im │       -2.8-0im │       -2.8-0im │       -2.8+0im │
│ -0.522-0im │      -1.16+0im │      -1.16-0im │      -1.16-0im │      -1.16+0im │
│   49.8+0im │  0.0401-3.14im │  0.0401-9.42im │  0.0401-3.14im │  0.0401-3.14im │
│   1.94-0im │    1.14+3.14im │    1.14-3.14im │    1.14-9.42im │    1.14+3.14im │
│   1.34+0im │    1.94-3.14im │    1.94-9.42im │    1.94-3.14im │    1.94-3.14im │
│     13+0im │   0.154-3.14im │   0.154-9.42im │   0.154-3.14im │   0.154-3.14im │
└────────────┴────────────────┴────────────────┴────────────────┴────────────────┘

Arguments like the above with signed zeros are important for the matrix functions. Those extra multiples of 2 * pi * im put the terms out of the range of validity of the Pade forms, at least for the logarithm.

@stevengj I tried the formulation in your PR with the above sign change and see no trouble (for matrices which should be well-conditioned for log).

StefanKarpinski commented 5 years ago

This took me a second, but it's key to reading the table above that the "baseline" and "proposed" columns always match.

ztultrebor commented 5 years ago

@RalphAS are there values of z for which f_baseline gives the wrong answer?

ztultrebor commented 5 years ago

What about this?

function matrix_log_taylor(A, i=256)
    B0 = A - I
    R = copy(B0)
    B = copy(B0)
    for n=2:i
        B *= B0
        R += (-1)^(n+1)*B/n
    end
    return R
end

function logmat(A)
    Λ, S = eigen(A)
    if length(Set(Λ))==length(Λ) 
        # no repeated eigenvalues
        return S*(log.(complex(Λ)).*inv(S))
    else   
        # repeated eigenvalues
        n = 0
        while norm(A-I)>1
            A = sqrt(A)
            n += 1
        end
        return 2^n*matrix_log_taylor(A)
    end
end

See here p.33. This approach seems to be more correct and faster than the default method.

dorn-gerhard commented 2 years ago

Refering to the example of the beginning of this issue: C = [30 20; -50 -30]

log(exp(C)) seems to work now fine, but exp(log(C)) shows a rather strange behavior: 2×2 Matrix{Float64}: -7.69911 -5.13274 12.8319 7.69911

stevengj commented 2 years ago

@dorn-gerhard, I think you mean that exp(log(C)) works as expected log(exp(C)) gives the surprising result you quoted?

julia> C = [30 20; -50 -30];

julia> log(exp(C))
2×2 Matrix{Float64}:
 -7.69911  -5.13274
 12.8319    7.69911

This is just because log is multi-valued — the result above is correct:

julia> exp(log(exp(C))) ≈ exp(C)
true

In particular, if you look at A = C - log(exp(C)), it is a matrix whose exponential is the identity:

julia> A = C - log(exp(C))
2×2 Matrix{Float64}:
  37.6991   25.1327
 -62.8319  -37.6991

julia> exp(A)
2×2 Matrix{Float64}:
  1.0          3.55271e-14
 -8.52651e-14  1.0

because the eigenvalues of A are ±2πi:

julia> eigvals(A) / 2pi
2-element Vector{ComplexF64}:
 -2.297067269945873e-16 - 1.9999999999999998im
 -2.297067269945873e-16 + 1.9999999999999998im
stevengj commented 2 years ago

Did we ever incorporate @RalphAS's improvement above? Looks like it: https://github.com/JuliaLang/julia/blob/635449dabee81bba315ab066627a98f856141969/stdlib/LinearAlgebra/src/triangular.jl#L2110