JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.69k stars 5.48k forks source link

eigs gives wrong eigenvalues #6965

Closed ViralBShah closed 10 years ago

ViralBShah commented 10 years ago

It has been cautioned in ARPACK docs that one must link to the LAPACK 2.0 that ships with ARPACK. Specifically, it is the dlahqr routine that we need from LAPACK 2.0. Since arpack-ng does not include this patch yet, we should probably do it ourselves.

http://forge.scilab.org/index.php/p/arpack-ng/issues/1315/

[Edit: Although this was the original issue, it boiled down to fixing the small eigenvalues case and the LAPACK2 patch seems not to be necessary for now.]

andreasnoack commented 10 years ago

It would be great to have our own iterative eigensolver.

ViralBShah commented 10 years ago

I agree, but it will take a fair amount of effort to get it right. Until then, incremental work on arpack is ok, I feel. The two can coexist now providing for a smooth transition.

ViralBShah commented 10 years ago

For now, all that has to be done is to replace calls to xLAHQR with the xLAHQR from LAPACK 2.0 (included in the original ARPACK distribution, not arpack-ng). We can have xLAHQR2 or AR_xLAHQR linked into ARPACK and have the ARPACK routines call it. Then, check that the solutions are correct for problems mentioned in the arpack-ng issue above, and we should be good to go.

Cc: @staticfloat

staticfloat commented 10 years ago

Seems reasonable to me. I've got a fair bit on my plate at the moment, so I won't be able to author this anytime soon, but if someone else submits a patch I will gladly test.

ViralBShah commented 10 years ago

I will probably be able to do it in a few days. Just want to make sure we include it in 0.3.

ViralBShah commented 10 years ago

I have used DLAHQR from LAPACK 2.0. In these tests I am using the full LAPACK 2.0 with ARPACK - but I can't get the right answer. Moreover, something else seems amiss too:

julia> A = [
         1.0   1.0   1.0   1.0   1.0   1.0   1.0  1.0
        -1.0   2.0   0.0   0.0   0.0   0.0   0.0  1.0
        -1.0   0.0   3.0   0.0   0.0   0.0   0.0  1.0
        -1.0   0.0   0.0   4.0   0.0   0.0   0.0  1.0
        -1.0   0.0   0.0   0.0   5.0   0.0   0.0  1.0
        -1.0   0.0   0.0   0.0   0.0   6.0   0.0  1.0
        -1.0   0.0   0.0   0.0   0.0   0.0   7.0  1.0
        -1.0  -1.0  -1.0  -1.0  -1.0  -1.0  -1.0  8.0
       ];

Computing all the eigenvalues:

julia> (d,v) = eigs(A, which="SM", nev=8)
(Complex{Float64}[2.53469+0.0im,6.384+1.29179im,6.384-1.29179im,6.46531+0.0im,5.11324+0.0im,3.88676+0.0im],

Computing only the two smallest eigenvalues, we get these, which are wrong, since 2.53469 is smaller.

julia> eigs(A, which="SM", nev=2)
(Complex{Float64}[2.616+1.29179im,2.616-1.29179im],

More interestingly, as I play around with maxiter and tol, I get different values:

julia> eigs(A, which="SM", nev=2, maxiter=100000)
(Complex{Float64}[2.53469+0.0im,6.384+1.29179im],

julia> eigs(A, which="SM", nev=2, maxiter=100000, tol=1e-9)
(Complex{Float64}[6.384+1.29179im,6.384-1.29179im],
ViralBShah commented 10 years ago

cc: @jutho

Jutho commented 10 years ago

I am not too experienced with Arpack, especially not with the implementation of it and its dependence on Lapack. Also, I never really had to use smallest magnitude in any application. I can see that what you are trying to do is pretty difficult for Arpack. Firstly, for smallest magnitude, the normal Arnoldi approach based on a Krylov space of vectors [x,A_x, ...] is not expected to work to well; this only works for extremal eigenvalues whereas the smallest magnitude eigenvalues are in the interior of the spectrum. In fact, I believe Matlab will always use the A\x approach in eigs when requesting the eigenvalues of smallest magnitude, and for function handles it requires them to implement A\x instead of A_x. Secondly, you are requesting two eigenvalues of smallest magnitude, whereas the second and third smallest one are 'degenerate' in the sense of their magnitude. It is well known that with iterative eigensolvers, if there is a degeneracy in the eigenvalues, it is best practice (if you know about this degeneracy) to either ask for all or them, or none of them.

Jutho commented 10 years ago

All of this should are of course considerations for larger realistic problems for eigs, but should not pose an issue for such a small matrix, where the Krylov space should span the whole matrix. I would need to dig into the implementation again to see how big the Krylov space is chosen for such a small matrix?

ViralBShah commented 10 years ago

MATLAB and octave that use arpack too produce the right answer.

Jutho commented 10 years ago

I don't know about octave but I've ran through Matlab for this example step by step and it is certainly using the shift and invert strategy with shift zero, and using the LU decomposition of A to apply its inverse. I don't think the julia eigs implementation does this currently, but I haven't checked so please correct me if I am wrong.

ViralBShah commented 10 years ago

We require sigma to be zero for that. Good point. I will try that out.

ViralBShah commented 10 years ago

Moving this to 0.4. I have no idea what needs to happen here.

Jutho commented 10 years ago

Does it not work using shift and invert, without the shift part then, for smallest magnitude. Matlab certainly uses LU decomposition to apply A inverse if A is a matrix, and if A is a function, the documentation clearly stipulates

    eigs(AFUN,N) accepts the function AFUN instead of the matrix A. AFUN is
    a function handle and Y = AFUN(X) should return
       A*X            if SIGMA is unspecified, or a string other than 'SM'
       A\X            if SIGMA is 0 or 'SM'

I think it is just impossible to find the smallest magnitude eigenvalues using a Krylov method that multiplies vectors with A.

ViralBShah commented 10 years ago

Nope. It did not work right with shift invert too. I will try it once again. Can you try the example in this issue ?

vtjnash commented 10 years ago

what is holding up closing this issue?

jey commented 10 years ago

FYI I believe there is a bug in the computation of lworkl in aupd_wrapper: https://github.com/jey/julia/commit/f4b8faf6e99bc5f02371739bf307359b1380f71f

Jutho commented 10 years ago

I think the original version corresponds with what is in the Arpack User Guide for DNAUPD and ZNAUPD: http://www.caam.rice.edu/software/ARPACK/UG/node137.html http://www.caam.rice.edu/software/ARPACK/UG/node138.html

What I find strange (though this cannot be the cause of the issue under investigation) is the default choice of mcv. In the named argument, the default is ncv=20, which is ok, but then there is the line ncv = blas_int(min(max(2*nev + 2, ncv), n)) The only requirement is that ncv>= nev+2 for nonsymmetric or complex problems, and ncv>=nev+1 for real symmetric.

max(2*nev+1,20) seems to be M****_’s default choice for non symmetric or complex problems and max(2_nev,20)for real symmetric. But if a user specifies a valuencvsatisfying the lower bound and smaller than the default (e.g`nev+2 <= ncv <2nev+1` ) then this line will change the user’s request.

For the current matrix, ncv should always default to n=8, so the Krylov subspace spans the whole matrix and Arpack should exactly know all eigenvalues after a single iteration, so not returning the correct eigenvalue is really indicating a bug. I was really surprised to see that even for something as stable as ‘LM’ the current eigs seems to return the wrong result occasionally.

On 10 Jul 2014, at 22:04, Jey Kottalam notifications@github.com wrote:

FYI I believe there is a bug in the computation of lworkl in aupd_wrapper: jey@f4b8faf

— Reply to this email directly or view it on GitHub.

Jutho commented 10 years ago

I received the message below on my mailbox, but cannot find it on Github. If you give me some instructions on what configuration you are using exactly, or how to enable the LAPACK 2.0, I can try to take a look tomorrow. I will also try to contact a former colleague of mine, who built arpack-ng for a project of his and was also troubled by a bug that might be related/identical.

On 10 Jul 2014, at 20:17, Viral B. Shah notifications@github.com wrote:

I am out of ideas on this one. Either there is some sort of corruption, or there is a bug in ARPACK. I do have an ARPACK with LAPACK 2.0 mangled names. The last I tried, it didn't help fix this issue, but I will revisit it again in a couple of days. If you can take a look, I can send it to you.

— Reply to this email directly or view it on GitHub.

Jutho commented 10 years ago

Finally clicked on the link in @ViralBShah very first post. This issue was reported by Valentin (Zauner), another colleague of mine. I remember him telling me about this, but since I was still using Matlab I didn't care too much :-). I will take a deeper look through the whole thread tomorrow. From what I understand from quickly skimming through it, replacing DLAHQR is easier then patching dneupd? Well, certainly for people like me that do not read Fortran, but it sounds terrible that there is no fix for this in arpack-ng after 9 months.

ViralBShah commented 10 years ago

@jey I looked at the docs again, and what is in place is correct. The patch seems to over-allocate. For example, for dnaupd, LWORKL should be 3*NCV**2 + 6*NCV. If NCV=5, this is 105, but your formula evaluates it to 165.

Jutho commented 10 years ago

That was indeed what I was trying to see three posts above with the two URLs. But I should have been more specific as to what this was a response.

ViralBShah commented 10 years ago

This is my patched arpack-ng-3.1.5 to use LAPACK2: https://github.com/ViralBShah/arpack-ng-lapack2

ViralBShah commented 10 years ago

On the ncv requirement, the suggestion is in the ARPACK docs:

c  4. At present there is no a-priori analysis to guide the selection
c     of NCV relative to NEV.  The only formal requrement is that NCV > NEV + 2.
c     However, it is recommended that NCV .ge. 2*NEV+1.  If many problems of
c     the same type are to be solved, one should experiment with increasing
c     NCV while keeping NEV fixed for a given test problem.  This will 
c     usually decrease the required number of OP*x operations but it
c     also increases the work and storage required to maintain the orthogonal
c     basis vectors.  The optimal "cross-over" with respect to CPU time
c     is problem dependent and must be determined empirically. 
c     See Chapter 8 of Reference 2 for further information.
Jutho commented 10 years ago

I am not opposing against the default value, but the current behaviour overrides the user setting, which might not be very nice. Some things interfered with my plan to look into the arpack issue of this thread, but I will try to do so this evening or tomorrow.

ViralBShah commented 10 years ago

That is a good point. We should certainly not override the user value. That explains many things.

ViralBShah commented 10 years ago

It does seem reasonable to use the higher of the user-defined or the internal defined ncv.

Jutho commented 10 years ago

Why not set ncv=max(20,2*nev+1) in the function definition, and then set ncv=min(n,max(ncv,nev+(issym(A) && isreal(A) ? 1 : 2)) on line 19 of arnoldi.jl. Or better yet, warn explictly that ncv is to small if ncv < nev+(issym(A) && isreal(A) ? 1 : 2).

ViralBShah commented 10 years ago

Would it be possible for you to make a PR sometime?

Jutho commented 10 years ago

I am now trying to add your patched arpack with lapack 2.0 into a fresh julia clone so as to test the main issue of this thread. I am not very experienced with Makefiles etc. What is the best way to proceed?

ViralBShah commented 10 years ago

Ok - this is what you can do:

  1. Clone the repository in deps/
  2. Do the equivalent of this (change the path): ./configure --prefix=/home/viral/julia/usr --build=x86_64-linux-gnu --libdir=/home/viral/julia/usr/lib F77="gfortran -m64 -ffixed-line-length-none" CC="gcc -m64" CXX="g++ -m64" --with-blas="-L/home/viral/julia/usr/lib -lopenblas" --with-lapack="-L/home/viral/julia/usr/lib -lopenblas" --disable-mpi --enable-shared FFLAGS=" -fdefault-integer-8 -O2 -fPIC" LDFLAGS="-Wl,-rpath,'/home/viral/julia/usr/lib'"
  3. make
  4. rm -f ../usr/lib/libarpack* && cp .libs/libarpack* ../usr/lib to delete the old arpack and install the new arpack.
ViralBShah commented 10 years ago

Should we switch to doing shift-invert for which="SM" in any case? Also, should we use :SM etc. instead as well?

Jutho commented 10 years ago

There are two issues that need to be solved:

Firstly, indeed, for "SM", one should always use the shift-invert (without shift). Krylov methods based on just multiplying with A cannot find interior eigenvalues. For now, I just investigated this issue by using eigs(A;which="LM"), but the better strategy is indeed as in eigs, using factorize.

The second issue is in the eupd_wrapper. Note that Arpack's eupd routines potentially return nev+1 Ritz values, because it could happen that you just cut through a complex conjugate pair (as you are trying to do here, you request 2 eigenvalues, whereas the 3rd is actually the complex conjugate of the 2nd). If you replace line 161 of arpack.jl by just d = complex(dr,di), you will see all 3 values. For this particular example, you always get the smallest one (largest magnitude of inv(A)) and then the complex conjugate pair with second smallest (second largest) magnitude. However, the order in which they appear changes for different repetitions, and so by just selecting d[1:nev], you sometimes just get the complex conjugate pair and miss out on the one which actually has the smallest (largest) magnitude.

Jutho commented 10 years ago

Hence, I strongly suggest reopening this issue and fixing it before the official 0.3 release. I might be able to do this tonight, but cannot make any promises, so whoever wants to fix this first, be my guest.

ViralBShah commented 10 years ago

@Jutho This issue is open and assigned to 0.3 milestone. Hence the release won't happen without fixing this.

ViralBShah commented 10 years ago

With the example in this thread, that does not seem to help. If I return all the nev+1 eigenvalues from eupd_wrapper, I don't get what I would expect.

julia> A = [
               1.0   1.0   1.0   1.0   1.0   1.0   1.0  1.0
              -1.0   2.0   0.0   0.0   0.0   0.0   0.0  1.0
              -1.0   0.0   3.0   0.0   0.0   0.0   0.0  1.0
              -1.0   0.0   0.0   4.0   0.0   0.0   0.0  1.0
              -1.0   0.0   0.0   0.0   5.0   0.0   0.0  1.0
              -1.0   0.0   0.0   0.0   0.0   6.0   0.0  1.0
              -1.0   0.0   0.0   0.0   0.0   0.0   7.0  1.0
              -1.0  -1.0  -1.0  -1.0  -1.0  -1.0  -1.0  8.0
             ];

julia> eigs(A, nev=2, which="SM", sigma=0) # sigma=0 for shift-invert
(Complex{Float64}[6.384-1.29179im,6.384+1.29179im,4.94066e-324+0.0im],
8x3 Array{Complex{Float64},2}:
  -0.127548+0.291925im    -0.127548-0.291925im   0.0+0.0im
  0.0385778+0.102107im    0.0385778-0.102107im   0.0+0.0im
  0.0384747+0.132241im    0.0384747-0.132241im   0.0+0.0im
  0.0277156+0.181881im    0.0277156-0.181881im   0.0+0.0im
 -0.0271345+0.262102im   -0.0271345-0.262102im   0.0+0.0im
  -0.219297+0.298219im    -0.219297-0.298219im   0.0+0.0im
   -0.34143+0.0702167im    -0.34143-0.0702167im  0.0+0.0im
   0.173476+0.689726im     0.173476-0.689726im   0.0+0.0im,

2,1,8,[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0])
Jutho commented 10 years ago

I think you have now computed the 2 smallest eigenvalues of inv(A), which are the largest magnitude eigenvalues of A and come as a complex conjugate pair. So, firstly, there is an issue if you simultaneously select "SM" and sigma=0.

Secondly, the nev+1 eigenvalue will not always be there. In this case, what was actually requested, because of my previous sentence, is the 2 smallest magnitude eigenvalues of inv(A), which come as a complex conjugate pair already, so there is no need to add a conjugate partner to the list. It just reports the two requested eigenvalues, and seems to add some arbitrary value 4.94066e-324+0.0im as eigenvalue nev+1.

ViralBShah commented 10 years ago

Duh of course. This is it then:

julia> eigs(A, nev=2, which="LM", sigma=0)
(Complex{Float64}[2.616-1.29179im,2.616+1.29179im,2.53469+0.0im],
8x3 Array{Complex{Float64},2}:
  -0.173476+0.689726im    -0.173476-0.689726im   0.0+0.0im
    0.34143+0.0702167im     0.34143-0.0702167im  0.0+0.0im
   0.219297+0.298219im     0.219297-0.298219im   0.0+0.0im
  0.0271345+0.262102im    0.0271345-0.262102im   0.0+0.0im
 -0.0277156+0.181881im   -0.0277156-0.181881im   0.0+0.0im
 -0.0384747+0.132241im   -0.0384747-0.132241im   0.0+0.0im
 -0.0385778+0.102107im   -0.0385778-0.102107im   0.0+0.0im
   0.127548+0.291925im     0.127548-0.291925im   0.0+0.0im,

3,1,8,[0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0])
Jutho commented 10 years ago

... which looks perfect :-). Now one just needs the postprocessing to select the actual nev eigenvalues requested and the corresponding eigenvectors (if requested).

ViralBShah commented 10 years ago

I guess we probably should be doing the sorting in all the cases among the nev+1 eigenvalues.

Jutho commented 10 years ago

What worries me is that in the first case, where you combined "SM" with sigma=0, you got something very small as extra nev+1th value. This means that, if you would select the 2 smallest magnitude values of those three values reported, you would actually select that near-zero value. But maybe that only happened because you combined "SM" with sigma=0. In fact, I don't see a use case for "SM" ever. It is good as external value of the parameter, but it should always be converted to "LM" for the inverse of A in the internal process.

Jutho commented 10 years ago

The eigs problems continue ... https://groups.google.com/forum/m/#!msg/julia-users/ZTSfIDLpYpE/oFajPSzU8jQJ

ViralBShah commented 10 years ago

Do you know where the non determinism comes from?

gajomi commented 10 years ago

No idea about source of non-determinism, but I thought I would make a comment about the class of problems wherein I found this bug. The eigenvector/eigenvalue pair I am interested in is the one guaranteed by Perron–Frobenius. The particular example above is furthermore a stochastic matrix, and random test cases for the problem can be constructed with:

N = 1024
T = mapslices(r->r/sum(r), rand(N,N), 2)

I thought the problem might have something to do with test matrices being too small, so I ran a few test cases using the above to confirm that both eig / eigs inconsistency and eigs non-determinism persists for large N.

But, to return to the comment about these test cases being stochastic matrices, I would like to point out that in this case one may have a number of guarantees on the sign/size of the dominant value and the sign of the eigenvector, with sufficient conditions that can be tested in N^2 operations. If it is not already the case (I couldn't grok what what was happening in the ARPACK wrapper call) adding some logic to check for these things could help in setting a good initial condition for the iterative method.

ViralBShah commented 10 years ago

This is the matrix:

T = [ 0.9  0.05  0.05
 0.8  0.1   0.1 
 0.7  0.1   0.2 ]
v1 = collect(eigs(T,nev=1,which = :LM)[2])
v2 = collect(eigs(T,nev=1,which = :LR)[2])

Running this repeatedly produces different results. I am suspecting that this has to do with the small size and corruption somewhere.

jiahao commented 10 years ago

If nothing else, the calculation is nondeterministic since ARPACK by default chooses a random starting vector (essentially a normalized randn(n)) unless you specifically pass it a starting vector. Usually you don't see the dependence on the starting vector because most matrices are well-behaved in the sense that a random point on the N-sphere overlaps sufficiently with the desired subspace spanned by the eigenvectors of interest. In this case I suspect this matrix has some structure that exposes greater sensitivity to the starting vector.

jiahao commented 10 years ago

On average, however, the default random initial condition does correctly pick out the largest eigenvector.

T = [ 0.9  0.05  0.05
 0.8  0.1   0.1 
 0.7  0.1   0.2 ]
vals, vecs = eig(T)
n = 10^6
whichevec = zeros(Int,n)
for i=1:n
    v1 = collect(eigs(T,nev=1,which = :LM)[2])
    whichevec[i] = findmax(abs(vecs'v1))[2]
end

println("Eigenvector Frequency")
for i=1:3
    println(i, "           ", sum(whichevec.==i))
end

The result of this simple code for me produces

Eigenvector Frequency
1           702288
2           180641
3           117071

Changing the eigs call to eigs(T,nev=1,which = :LM,v0=ones(3)) (specifying the correct eigenvector as the initial guess) does produce the expected statistics:

Eigenvector Frequency
1           1000000
2           0
3           0

I'm tempted to say that this is correct and expected behavior of the Arnoldi method.

Jutho commented 10 years ago

I am not really convinced. For this small matrix, the krylov subspace is just the full vector space and the arnoldi result should be exact. I am on holidays for a couple of days but will take a look when I return, if not solved by then.

andreasnoack commented 10 years ago

@jiahao I think that @Jutho's argument makes sense. Also, in the other language for technical computing I get

T = [ 0.9  0.05  0.05
      0.8  0.1   0.1 
      0.7  0.1   0.2 ];

[vecs, vals] = eig(T);

n = 10^3;
whichevec = zeros(n, 1);
for i = 1:n
    opts.v0 = randn(3,1);

    [v, ~] = eigs(T, 1, 'lr', opts);

    [~, ii] = max(abs(v'*vecs));

    whichevec(i) = ii;
end

mean([whichevec == 1 whichevec == 2 whichevec == 3])

ans =

     1     0     0
alanedelman commented 10 years ago

I would expect independent of starting vector, one gets convergence if not always nearly always.

On Sun, Sep 7, 2014 at 9:06 PM, Andreas Noack notifications@github.com wrote:

@jiahao https://github.com/jiahao I think that @Jutho https://github.com/Jutho's argument makes sense. Also, in the other language for technical computing I get

T = [ 0.9 0.05 0.05 0.8 0.1 0.1 0.7 0.1 0.2 ]; [vecs, vals] = eig(T); n = 10^3;whichevec = zeros(n, 1);for i = 1:n opts.v0 = randn(3,1);

[v, ~] = eigs(T, 1, 'lr', opts);

[~, ii] = max(abs(v'*vecs));

whichevec(i) = ii;end

mean([whichevec == 1 whichevec == 2 whichevec == 3]) ans =

 1     0     0

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/julia/issues/6965#issuecomment-54767648.