eigs takes too long to converge

stevengj commented 11 years ago

The code below, which constructs a sparse discrete lapacian (or –laplacian, actually) inside a cylinder and evaluates the smallest eigenvalue using eigs, dies with an ARPACKException for me:

Changing to Nx=Ny=100 works fine. However, Nx=Ny=400 is "only" a 100000x100000 (positive-definite real-symmetric) sparse matrix from a 2d grid (hence the sparse-direct solver should be efficient), and similar code works fine in Matlab.

What does this error mean? cc: @ViralBShah

# construct the (M+1)xM matrix D, not including the 1/dx factor
diff1(M) = [ [1.0 zeros(1,M-1)]; diagm(ones(M-1),1) - eye(M) ]

# sparse version (the lazy way):
sdiff1(M) = sparse(diff1(M))

# make the discrete -Laplacian in 2d, with Dirichlet boundaries
function Laplacian(Nx, Ny, Lx, Ly)
   dx = Lx / (Nx+1)
   dy = Ly / (Ny+1)
   Dx = sdiff1(Nx) / dx
   Dy = sdiff1(Ny) / dy
   Ax = Dx' * Dx
   Ay = Dy' * Dy
   return kron(speye(Ny), Ax) + kron(Ay, speye(Nx))
end

# Define mesh for the cylinder
# Code adapted from lecture notes
Lx = 1
Ly = 1
Nx = 400
Ny = 400
x = linspace(-Lx,Lx,Nx+2)[2:end-1]   # a column vector
y = linspace(-Ly,Ly,Ny+2)[2:end-1]'  # a row vector
r = sqrt(x.^2 .+ y.^2)   # use broadcasting (.+) to make Nx x Ny matrix of radii
i = find(r .< 1)         # and get indices of points inside the cylinder
A = Laplacian(Nx,Ny,2Lx,2Ly)
Ai = A[i,i]              # to make a submatrix for the Laplacian inside the cylinder
λi, Ui= eigs(Ai, nev=1, which="SM")

[ViralBShah: The title was ARPACKException needs better error message, which is fixed.]

stevengj commented 11 years ago

Setting maxiter=10000 in eigs fixes the problem, so I guess it is just converging slowly. But it would be good to have a clearer error message, at least.

stevengj commented 11 years ago

(It would also probably converge faster if there were a way to exploit the fact that my matrix is symmetric positive definite. Related to #4476.)

ViralBShah commented 11 years ago

We should certainly give a better message. That is easy to fix.

I don't think there is a way to exploit spd property in arpack - only the fact that it is symmetric.

stevengj commented 11 years ago

If you are doing inverse iteration then you can use Cholesky factorization for SPD matrices.

A basic problem here is that eigs is over-typed. We should do duck-typing: support passing any object (not just an AbstractArray) that supports the necessary operations (size, \, * ?). That way we could pass factorization objects or other abstract linear operations.

andreasnoack commented 11 years ago

It is tempting to think of AbstractMatrix as the objects that support \ and *, cf. Jutho's post at the mailing list. At least as long as it is not clearly defined what AbstractMatrix is.

By the way. Not directly related to this issue but as a consequence of the example of this issue: eigs seems pretty slow. With the laplacian example I get

@time eigs(Ai,nev=1,which="SM")
elapsed time: 457.095567119 seconds (18300352424 bytes allocated)

and with a pure Julia implementation

@time Arnoldi.eigs(Ai, 1, 2000, eps())
elapsed time: 10.058114537 seconds (6387928 bytes allocated)
1-element Array{Float64,1}:
 5.76436

ViralBShah commented 11 years ago

I wonder what is happening. Can you post your implementation? What happens in matlab or octave? @vtjnash fixed some Fortran calling issues which could potentially explain this.

ViralBShah commented 11 years ago

@stevengj we should be able to exploit the spd property for free due to the use of . The polyalgorithm may need some tweaking.

stevengj commented 11 years ago

The memory usage is pretty insane too... 17GiB!

andreasnoack commented 11 years ago

The implementation is here https://gist.github.com/andreasnoackjensen/6963157. I only did it to learn how the Lanczos method works so it is very simple. I have defined A_mul_B! for sparse matrices to be allocation free like gemv which explains the big difference in allocated memory, but not the whole time difference. I tweaked the arnoldi code to use my A_mul_B! instead of * and now I get

julia> @time eigs(Ai,nev=1,which="SM")[1]
elapsed time: 382.070997098 seconds (85137664 bytes allocated)
1-element Array{Float64,1}:
 5.76436

so much of the reallocation is avoided but the timing is still not good. I think I have made Ai to be identical in MATLAB and there I get

>> tic;eigs(Ai,1,'sm');toc
Elapsed time is 3.371547 seconds.

so the difference is huge.

ViralBShah commented 11 years ago

579a005c995df57cd92f0dc90e6bda2b25158de6 gives us a better error message.

ViralBShah commented 11 years ago

@JeffBezanson I have a suspicion that type inference is not working as well as it ought to here, which might be slowing down eigs, by taking a lot of time in the sparse matvec.

JeffBezanson commented 11 years ago

Can you give me an example call that you think runs too slowly?

andreasnoack commented 11 years ago

@stevengj's example above is one. There eigs takes 457 seconds and a Julia implementation of the same calculation takes 5.76 second.

JeffBezanson commented 11 years ago

To narrow it down a bit, should I focus on the product of Ai and a random dense vector?

JeffBezanson commented 11 years ago

Ok, there aren't any type inference problems in sparse matvec. I tried hoisting the accesses to A.colptr etc. but that didn't do very much. (I'm looking at linalg/sparse.jl:14) We do need some codegen improvements around 1-d arrays, but I don't expect any more than a factor of 2 from that (probably not even that much).

stevengj commented 11 years ago

It would be good to check the number of iterations that eigs is taking to converge versus the version in Matlab or in Julia by @andreasnoackjensen. If the increase in time is proportional to that, then the problem is probably not codegen, it is some screwup of the Arnoldi algorithm.

ViralBShah commented 11 years ago

I wrote up this example in matlab, and it seems that matlab's eigs finishes in a handful of iterations (<10). There doesn't seem to be a way to get the number of iterations performed by eigs in matlab.

% constructs a sparse discrete lapacian (or –laplacian, actually) inside a cylinder
% Typically, call genmat with n = 400

function Ai = genmat(n)
% Define mesh for the cylinder
% Code adapted from lecture notes
Lx = 1;
Ly = 1;
Nx = n;
Ny = n;
x = linspace(-Lx,Lx,Nx+2);
x = x(2:end-1)';     % a column vector
y = linspace(-Ly,Ly,Ny+2);
y = y([2:end-1]');  % a row vector
r = sqrt(bsxfun(@plus, x.^2, y.^2));   % use broadcasting (.+) to make Nx x Ny matrix of radii
i = find(r < 1);         % and get indices of points inside the cylinder
A = Laplacian(Nx,Ny,2*Lx,2*Ly);
Ai = A(i,i);              % to make a submatrix for the Laplacian inside the cylinder

end

% make the discrete -Laplacian in 2d, with Dirichlet boundaries
function x = Laplacian(Nx, Ny, Lx, Ly)
   dx = Lx / (Nx+1);
   dy = Ly / (Ny+1);
   Dx = sdiff1(Nx) / dx;
   Dy = sdiff1(Ny) / dy;
   Ax = Dx' * Dx;
   Ay = Dy' * Dy;
   x = kron(speye(Ny), Ax) + kron(Ay, speye(Nx));
end

% construct the (M+1)xM matrix D, not including the 1/dx factor
function x = diff1(M)
  x = [ [1.0 zeros(1,M-1)]; diag(ones(M-1,1),1) - eye(M) ];
end

% sparse version (the lazy way)
function x = sdiff1(M) 
  x = sparse(diff1(M));
end

ViralBShah commented 11 years ago

@andreasnoackjensen Your code does not always seem to work with @stevengj 's problem, but when it does work, it seems to come back in 4 iterations for Nx=Ny=100 and 11 iterations for Nx=Ny=400. So, clearly something's up with our ARPACK implementation.

jiahao commented 11 years ago

can you tell matlab to run just one iteration at a time?

stevengj commented 11 years ago

In matlab, you can pass a function (that takes a vector and returns A*vector) instead of a matrix. Then just include a print statement in your function.

ViralBShah commented 11 years ago

While @alanedelman and I were exploring this, we took a look at the Golub-Kahan-Lanczos algorithm for the svd, which is about as simple as @andreasnoackjensen 's Arnoldi implementation. It may be best for us to have our own implementations of eigs and svds - even if we do the implicitly restarting versions later.

This is @alanedelman 's implementation of svds:

(m,n)=size(A)
αs=βs=zeros(0)
v=randn(n);v/=norm(v)
u=A*v

for k=1:100
  α=norm(u);αs=[αs; α];u/=α
  v=A'*u-α*v

  β=norm(v);βs=[βs; β];v/=β
  u=A*v-β*u
end
 println(round(svdvals(Bidiagonal(αs,βs[1:end-1],true))[1:5]',3))

alanedelman commented 11 years ago

Some notes on the above:

This is the basic, no bells and whistles Golub-Kahan-Lanczos. We certainly should remove the magic numbers 100 and 5, and incorporate implicit restarting and perhaps shift and invert strategies .

The algorithm, is mathematically the same as Householder reduction when the starting vector is eye(n,1) or something, but is suitable for matrices that are not dense.

I think the key point here is that ARPACK has served well for many years, but if we have a Julia implementation, we can probably look at these methods less as a block box not to be touched, and more like an organic algorithm to be improved upon as we go.

The same would apply for Lanczos tridiagonalization of course.

ViralBShah commented 11 years ago

Perhaps we should start an Arnoldi.jl package, which can move into base when it becomes ready. Seems like we have enough momentum.

alanedelman commented 11 years ago

Maybe should be called Lanczos.jl :-) or ArnoldiLanczos.jl

On Mon, Oct 28, 2013 at 7:48 AM, Viral B. Shah notifications@github.comwrote:

Perhaps we should start an Arnoldi.jl package, which can move into base when it becomes ready. Seems like we have enough momentum.

— Reply to this email directly or view it on GitHubhttps://github.com/JuliaLang/julia/issues/4474#issuecomment-27205093 .

andreasnoack commented 11 years ago

I would like to contribute to such a package.

jiahao commented 11 years ago

+1

ViralBShah commented 11 years ago

I just created this and you guys should all have commit access to it.

https://github.com/JuliaLang/ArnoldiLanczos.jl

jiahao commented 11 years ago

I think IterativeSolvers.jl would be a much more descriptive and inclusive term. Arnoldi and Lanczos are but two common examples of a much broader family.

Unless anyone has better references, I would recommend van der Vorst's Iterative Krylov Methods for Large Linear Systems, and the last couple of chapters in Trefethen and Bau.

StefanKarpinski commented 11 years ago

That does seem like a much better generic name.

ViralBShah commented 11 years ago

Done. https://github.com/JuliaLang/IterativeSolvers.jl

We could start off with @andreasnoackjensen 's gist above, @alanedelman 's code for GKL, and other iterative solvers that people started putting together. Perhaps future discussion can be had over in the issues for IterativeSolvers.

stevengj commented 11 years ago

Well, the Templates books (Linear Systems and Eigenproblems) are also decent references. Although they don't include Sleijpen's BiCGSTAB(L) algorithm, which is one of the better ones for large sparse nonsymmetric systems (don't waste your time with any of the other BiCG variants, this subsumes them).

acroy commented 10 years ago

Apparently MATLAB's eigs uses shift-and-invert if 'sm' is specified while we also have to have sigma!=0 to use it. Thus the comparison is not entirely fair (for us). With 35690fd and shift-and-invert I get

julia> @time (d, v, nconv, niter, nmult, resid) = eigs(Ai,nev=1,which="LM",sigma=0.1)
elapsed time: 4.506958406 seconds (192699384 bytes allocated)
<snip output of eigs>
julia> d
1-element Array{Float64,1}:
 5.76436
julia> niter
1
julia> nmult
20

which is much better. Without the patch the matrix is always converted to a dense matrix when calling lufact (in shift-and-invert mode), which is expensive in this example. Without shift-and-invert, I get

@time (d, v, nconv, niter, nmult, resid) = eigs(Ai,nev=1,which="SM",sigma=0)
elapsed time: 330.53247192 seconds (17535572788 bytes allocated)
<snip output of eigs>
julia> d
1-element Array{Float64,1}:
 5.76436
julia> niter
863
julia> nmult
8640

stevengj commented 10 years ago

It seems like we should use shift-and-invert for sigma != 0 and inverse iteration (i.e. shift-and-invert without the shift) for which="sm".

ViralBShah commented 10 years ago

@acroy Thanks for tracking this down. I kept thinking that we have some bug in our ARPACK interface, but this makes sense now.

ViralBShah commented 10 years ago

Can someone submit a PR? Unfortunately I am quite taken up for a few weeks and don't want to do this in haste.

JeffBezanson commented 10 years ago

Do we imagine merging the PR for this for 0.3?

ViralBShah commented 10 years ago

Yes. There is already an open PR, and should be possible to merge for 0.3

jiahao commented 10 years ago

Closed by #6053, with continuing work in IterativeSolvers.jl.

JuliaLang / julia

eigs takes too long to converge #4474