JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.46k stars 5.46k forks source link

sqrtm is not type-stable #4006

Closed johnmyleswhite closed 10 years ago

johnmyleswhite commented 11 years ago

Maybe we should change sqrtm so that it always returns a complex array?

julia> X = sqrtm([1.0 -0.9; -0.9 1.0])
2x2 Float64 Array:
  0.847316  -0.531089
 -0.531089   0.847316

julia> X * X
2x2 Float64 Array:
  1.0  -0.9
 -0.9   1.0

julia> X = sqrtm([1.0 -0.9; 0.9 1.0])
2x2 Complex{Float64} Array:
   1.0829+0.0im  -0.415549+1.11022e-16im
 0.415549+0.0im     1.0829+1.38778e-16im

julia> X * X
2x2 Complex{Float64} Array:
 1.0+4.61352e-17im  -0.9+1.82784e-16im
  0.9+5.7669e-17im   1.0+3.46701e-16im
ViralBShah commented 11 years ago

This is also the case with sqrt. This is what dynamic languages are meant for. The types also look fine. We can keep it the way it is IMO.

johnmyleswhite commented 11 years ago

That's not how sqrt behaves:

julia> sqrt(1.0)
1.0

julia> sqrt(-1.0)
ERROR: DomainError()
 in sqrt at math.jl:118

julia> sqrt(1.0 + 0.0 * im)
1.0 + 0.0im

julia> sqrt(-1.0 + 0.0 * im)
0.0 + 1.0im
JeffBezanson commented 11 years ago

eig behaves the same as sqrtm.

ViralBShah commented 11 years ago

Oops, sqrt was a bad example. But yes, eig does this.

johnmyleswhite commented 11 years ago

Ok. This behavior seems undesirable to me, but I suppose this is traditional enough that I should just learn to accept it. I would have thought we'd want behavior like fft has.

StefanKarpinski commented 11 years ago

I agree that this does not seem great from the type-stability perspective.

StefanKarpinski commented 10 years ago

One option would be to return a real matrix when the input is a matrix of Symmetric type and a complex matrix otherwise. That also has the advantage of avoiding the check when you've constructed the matrix to be symmetric. A big disadvantage of the current arrangement is that there isn't any way to call sqrtm type stably. We could do a similar scheme for eig too. The nice: the returned array is always complex and it's always correct but it might happen to be a purely real-valued complex array. If you know the matrix is symmetric and you want a real matrix back, you can wrap it in Symmetric first and get exactly what you want.

jiahao commented 10 years ago

The underlying issue is that special symmetries of the matrix (Hermitian, real symmetric etc) allow for special-cased algorithms that are significantly faster than the generic algorithms. Functions like sqrtm can be computed more stably and quickly for complex Hermitian than for general unsymmetric complex, and so the current methods actually check for special symmetries and explicitly dispatch appropriate specialized methods. If we turn off these symmetry detection checks, we compromise performance. If we explicitly cast the return into a Complex matrix type, we also potentially compromise performance if we have to force memory copies to create the desired type.

If a user wants to explicitly cast the return into a Complex matrix type, they should feel free to do so. But there seems no way to force the issue without compromising performance.

ivarne commented 10 years ago

It might be nice to mention the type instability issue in the documentation for sqrtm?

jiahao commented 10 years ago

Yeah, we should document this. This behavior isn't specific to sqrtm however; rather it is also a feature of other functions, most notably linear solve and eig.

Reopening as documentation issue.

johnmyleswhite commented 10 years ago

We should really be documenting the return type of every function.

jiahao commented 10 years ago

My earlier analysis of the sqrtm polyalgorithm turns out to be not quite right. This is what actually happens when sqrtm(A) is called:

  1. check if A is Hermitian or symmetric.
  2. If A is neither Hermitian nor symmetric, triangularize A by computing its Schur factorization. Compute the matrix square root of the triangular factor, then rotate the new matrix back into the original basis.
  3. If A is either Hermitian or symmetric, diagonalize A by computing its eigendecomposition. Take the square roots of the eigenvalues, converting them as necessary to handle the possibility of complex values being produced, then rotate the new matrix back into the original basis.

A matrix that is neither Hermitian nor symmetric will in general have a square root with complex elements. A matrix that is Hermitian or symmetric will also have a square root with complex elements unless it is positive definite, in which case we can specialize to the case of square root with real elements. Since you can determine whether a Hermitian/symmetric matrix is positive definite essentially for free after computing its eigenvalues, and recomposing a real matrix as a product of real factors is cheaper than in the complex case, I still think that the current polyalgorithm is optimally performant.

The current implementation of the individual components, however, suggests some beneficial refactoring. For example:

GunnarFarneback commented 10 years ago

A matrix that is neither Hermitian nor symmetric will in general have a square root with complex elements.

In general yes, but e.g. the special case of real matrices with no eigenvalues on the non-positive real axis (positive and conjugate pairs are fine) have real square roots. I wrote about some of the peculiarities of the matrix square root in a mailing list thread about sqrtm when it was implemented:

Like the scalar square root, the matrix square root often has multiple solutions, and even more so than in the scalar case. E.g. a positive definite nxn Hermitian matrix with distinct eigenvalues has 2^n different square roots, whereas repeated eigenvalues can allow infinitely many square roots. E.g. [-1 0;0 -1] has the obvious square roots [i 0;0 i], [i 0;0 -i], [-i 0;0 i], and [-i 0;0 i] but also non-diagonal solutions like [1 2;-1 -1], which has the property of being real. In fact real matrices often have real square roots, and it could be interesting to have a real variation of sqrtm. This is thoroughly investigated, at least in the non-singular case, in

N. J. Higham, Computing real square roots of a real matrix, Linear Algebra Appl. 88/89 (1987) 405–430.

As a special case, I believe that the implemented algorithm does compute a real solution, up to numerical noise, for real matrices with no eigenvalues on the non-positive real axis. This is easily checked in the Schur decomposition and the imaginary part could be dropped in that case. It would be good if someone could verify that this is correct.

The singular non-diagonalizable case can be quite tricky. E.g. [0 1;0 0] has no square root at all, neither real nor complex, but if we add another dimension we have [0 0 1;0 0 0;0 1 0]^2 = [0 1 0;0 0 0;0 0 0]. The implemented method not-a-numbers both of these.

I did write a Julia implementation of Higham's real square root method (starting from the real Schur decomposition instead of the complex Schur decomposition) but it's really not worth the code complexity.

jiahao commented 10 years ago

@GunnarFarneback thanks for the back history. I'll have to think for a bit about how easy it would be to check for the special case you mentioned in the triangular sqrtm routine.

The mailing list discussion belies a much more complicated issue, namely that of defining a principal square root. It bears repeating that matrices don't have unique square roots. I think most people would expect matrix-valued functions to produce the "canonical" version that is implemented, which is to factorize, square root, and rotate back to the original basis. But perhaps this is worth mentioning in the docs.