Implementation roadmap - Githubissues

jiahao commented 10 years ago

A comprehensive listing of methods in the references. Please remove any methods are impractical/obsolete and add methods worth implementing to the list.

Linear systems

Stationary methods
- [x] Jacobi (LT)
- [x] Gauss-Seidel (LT)
- [x] Successive overrelaxation, SOR (LT)
- [x] Symmetric SOR (LT)
Nonstationary methods
- [x] GMRES (LT, Saa)
- [x] MINRES (LT)
- [x] Conjugate Gradients, CG (LT)
- [x] LSQR
- [x] LSMR
- [x] BiCGStab(l) (Sle)
- [x] Chebyshev iteration (LT)
- [ ] Quasi-Minimal Residual, QMR (LT)
- [ ] DQGMRES

(Generalized) eigenproblems

Without subspace
- [x] (Shift-and-inverted) power iteration (ET)
- [ ] Rayleigh Quotient Iteration [1]
Krylov subspace methods
- [ ] Implicitly restarted Lanczos (ET)
- [ ] Implicitly restarted Arnoldi (ET)
Other subspace methods
- [ ] Jacobi-Davidson method (ET)
- [x] LOPCG
- [ ] Accelerated Rayleigh Quotient Iteration [1]
- [ ] (Shift-and-inverted) subspace iteration (ET)

Singular value decomposition

[x] Golub-Kahan-Lanczos (ET)

References

LT: Templates for the Solution of Linear Systems
ET: Templates for the Solution of Algebraic Eigenvalue Problems
Saa: GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems (pdf)
Sle: BiCGstab(l) and other hybrid Bi-CG methods (pdf)
TB: Numerical linear algebra
HV: Iterative Krylov methods for large linear systems

[1] Low-priority methods: they are quite expensive per iteration.

timholy commented 10 years ago

An awesome list. Somewhere on my harddrive I have an lsqr.jl file I could contribute, if interested.

stevengj commented 10 years ago

I would say that all the stationary methods are obsolete (they are only useful now as a component of multigrid). CGN (conjugate-gradient on the normal equations) seems hardly used these days; the fact that it squares the condition number makes it useless for large systems. I don't see the point of the simple power method and friends; it seems subsumed by restarted Arnoldi (it is just the special case of restarting every iteration). I don't see the point of implementing non-restarted Arnoldi or Lanczos (these are just the special case of a large restart period).

ViralBShah commented 10 years ago

Great list. I guess the non-restarted methods are more starting points for the restarted methods.

stevengj commented 10 years ago

For the same reason, we only want restarted GMRES (of which there are several variants, if I recall correctly).

stevengj commented 10 years ago

Note that I added LOPCG to the list; this is one of the best iterative eigensolvers for Hermitian systems in my experience (and has the added benefit of supporting preconditioning).

stevengj commented 10 years ago

Also, the solvers for ordinary eigenproblems are merely special cases of the solvers for generalized eigenproblems, so we shouldn't have to implement both variants in most cases. With a little care in implementation, it should be possible to have negliglble performance penalty in the case where the right-hand side operator is the identity, since I(x) = x is essentially free and doesn't make a copy.

StefanKarpinski commented 10 years ago

This is an amazing issue :heart:

JeffBezanson commented 10 years ago

A bcgstab was posted here: https://github.com/JuliaLang/julia/issues/4723

ViralBShah commented 10 years ago

We do not know about its license. For starters, the one from templates should be good.

jiahao commented 10 years ago

As @stevengj pointed out in the original discussion (JuliaLang/julia#4474), the state of the art for biconjugate gradients is the BiCGSTAB(l) (or ell?) variant. I purposely left the original BiCGSTAB algorithm off the list for that reason.

jiahao commented 10 years ago

I spent a few minutes transcribing symbol-for-symbol the BiCGSTAB(ell) algorithm in this gist. Unicode combining diacritics make it laughably easy to do this transcription. You can see clearly where when l=1 this reduces to BiCGSTAB.

Obviously this is subject to testing, etc.

ghost commented 10 years ago

A number of comments

Stationary methods are important and should be there. They can be used as smoothers in multigrid as well as first line preconditioners for other iterative methods.
Need to add least square solvers, in particular CGLS and LSQR
The solvers I added (bicgstab and gmres) are a direct translation from the book Templates so I do not see any license issue.
I plan to add cgls soon

ViralBShah commented 10 years ago

@timholy 's https://github.com/timholy/Grid.jl provides restriction and interpolation operators on grids.

JasonPries commented 10 years ago

With regard to the usefulness of CGN and CGS I would point to this paper:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.210.62

tkelman commented 10 years ago

I'd be interested to see some of the symmetric indefinite methods MINRES, SYMMLQ, MINRES-QLP (http://www.stanford.edu/group/SOL/software.html)

thraen commented 9 years ago

I tried the gauss-seidel function of the package as smoother for multigrid. The implementation given doesn't account for sparse matrices and is thus not usable in such contexts (large, sparse). Probably all the methods in stationary.jl have this issue.

lruthotto commented 9 years ago

@thraen: You might want to look into the function sor and other iterative solvers that are optimized for sparse matrices in https://github.com/lruthotto/KrylovMethods.jl. I use the sor a lot as a preconditioner and it is reasonably fast. It makes use of the sparse matrix format.

thraen commented 9 years ago

Thanks a lot for pointing that out!

On Wed, Jun 10, 2015 at 8:39 AM, Lars Ruthotto notifications@github.com wrote:

@thraen https://github.com/thraen: You might want to look into the function sor and other iterative solvers that are optimized for sparse matrices in https://github.com/lruthotto/KrylovMethods.jl. I use the sor a lot as a preconditioner and it is reasonably fast. It makes use of the sparse matrix format.

— Reply to this email directly or view it on GitHub https://github.com/JuliaLang/IterativeSolvers.jl/issues/1#issuecomment-110614177 .

Sisyphuss commented 8 years ago

Is it worthwhile to add Aitken's delta-squared process into the list?

klkeys commented 8 years ago

I noticed that this package was listed for Julia GSOC 2016. Mike Innes suggested that I pipe up here if I wanted to help. Is anybody here able and willing to serve as a mentor?

jiahao commented 8 years ago

@klkeys Thanks for your interest. I'd be thrilled to have someone help out with this package!

One of the biggest challenges is to unify the various methods available in similar packages like https://github.com/lruthotto/KrylovMethods.jl and https://github.com/JuliaOptimizers/Krylov.jl. In some sense, this package has been more experimental in nature whereas the other two have been more focused on efficient implementations that work well for practical applications. Now that we (or at least, I) have some experience in writing these algorithms, I'd be very interested to figure out a common API across all the various solvers available.

cc: @lruthotto @dpo

lruthotto commented 8 years ago

I'd be glad to help work out a common API and a convergence of the packages in terms of efficiency. I'm not sure how much time I'll have to work on it myself during this summer, but I'd be happy to provide input as much as I can. Please keep me posted if you start a discussion somewhere.

dpo commented 8 years ago

I'm very interested in working out a good common API as well. I'm currently quite happy with the API in Krylov.jl but I'm of course open to discussing it. It's very unlikely that I'll have free time this summer to mentor though, but I'm happy to participate in discussions when I can.

zimoun commented 8 years ago

Will anyone work on Julia GSOC 2016 ?

It should be nice to have a common API to easily switch between the different implementations or to merge them. For example,

the preconditioning scheme is different (e.g., M(x) in KrylovMethods.jl and M\x in IterativeSolvers.jl.
an abstract layer to manage preconditioners and then implement classic ones (see there e.g., sec. 4.4 Preconditioners and Tab. 4 )
write a "bridge" to be able to transparently use PETSc.jl, other GSOC 2016.

cc: @cortner related issue and #71

ChrisRackauckas commented 8 years ago

I think MINRES should be added to this list.

lruthotto commented 8 years ago

@zimoun: We chose to provide the preconditioner as a function handle that computes the inverse, for example, M(x) = M\x to allow as much flexibility to the user as possible. This way you should be able to use routines from PETSc without an issue (you may have to write a wrapper). I'm not sure you need another abstract layer for preconditioners.

Something we could to in KrylovMethods is to allow M being a matrix and then building the function. We do something similar for A already. Do you think that's a good way to go?

@ChrisRackauckas : There is a basic version of minres in KrylovMethods (https://github.com/lruthotto/KrylovMethods.jl/blob/master/src/minres.jl) which might be a good starting point.

cortner commented 8 years ago

I'm not sure you need another abstract layer for preconditioners.

My own perspective: in optimisation you need more than just M(x). You need to provide M * x, M \ x, <x, My>, <x, M\y>.

But I would argue even in linear algebra you may want to create some abstraction. E.g., is the preconditioner right- or left-preconditioned? Maybe I am overcomplicating it. I do think for Optim.jl it will be useful to use dispatch instead of function handles. I won't get too stressed if this is incompatible with IterativeSolvers.jl, though it might be a shame.

lruthotto commented 8 years ago

I agree that we should distinguish between left and right preconditioners. How about having two variables, for example, Mr and Ml doing this?

Regarding the abstraction issue: Can you give me an example when you need to multiply with the preconditioner?

cortner commented 8 years ago

Regarding the abstraction issue: Can you give me an example when you need to multiply with the preconditioner?

Basically to compute distance, but I only need the inner products for that. So now that you pressure me to think about it, I am not sure whether it was just sloppiness on my part. I will think about it some more, but it is possible that only M \ x, <x, My>, <x, M \ y> is needed.

But don't you need M * x for right-preconditioning?

lruthotto commented 8 years ago

No, you need the inverse for left- and right-preconditioning. See, e.g., Chapter 9 in Saad's book: www-users.cs.umn.edu/~saad/IterMethBook_2ndEd.pdf

Regarding the inner product: Think, also about when you need that inside the linear solver. So far, I haven't seen the necessity and still think that inside the methods you only need M\x.

cortner commented 8 years ago

I went through a couple of algorithms that I've used in the past. I now think most can be written (or re-written) without <x, Mx> , but I am not sure all of them. Three examples:

E.g., the Hager-Zhang nCG needs to compute the length of a step; see Optim.jl/src/cg.jl, line 236: (here, P = M^{-1})

        dPd = precondinvdot(s, cg.P, s)

Maybe this can be removed, but I don't immediately see how; I can turn it into a recursion, but then I still need it at least once.

Another example comes up in saddle-search, but very similar context: to project out some "bad directions" y_j we add \sum_{j} <x, y_j>_P^2 to the objective, but this is part of the algorithm, not part of the user input. In the same context (saddle search) you need to solve a generalised eigenvalue problem H v = lam P v and I don't see how to do that without providing <v, Pv>.
Finally, trust-region algorithms need to work with step-lengths, and once you hit the boundary of the trust region you also switch to a generalised eigenvalue solver which I think needs P * v again, or at least <v, Pv>. (on this last point, I'd need to go back and double-check the codes)

dpo commented 8 years ago

There's an implementation of MINRES in Krylov.jl: https://github.com/JuliaOptimizers/Krylov.jl/blob/develop/src/minres.jl. It supports preconditioning. Only SPD preconditioning makes sense for MINRES, except in very specific cases.

zimoun commented 8 years ago

@lruthotto I agree and I think that provide only the matrix-vector product as function is the right direction; as you are explaining and as it is implemented in KrylovMethods.jl. My point is: it is not consistent with the current behavior of IterativeSolvers.jl see e.g., gmres l.81, which uses M\x hard coded. It was the starting point of the discussion #71. From my point of view, all the operations should be a type in the flavor of LinearOperator.jl

Since we are talking about projection methods, this means relative to an inner product, isn't it ? (note: I am not sure, but I think <v, Mv> and/or <v, M\v> are used in some deflated methods ?)

cortner commented 8 years ago

@zimoun : why does M \ x, which is the same as `(M, x) not count as a function for you?

cortner commented 8 years ago

@lruthotto Just to add to my examples above: it is maybe becoming clear now that for linear algebra you only need M \ x, and that any other functionality would be an extension used in other contexts. So I may just leave you in peace here and start a new issue in Optim.jl, but try and keep an eye on this.

zimoun commented 8 years ago

@cortner I am not sure to understand the question. Maybe I miss a point. From what I know, M(x) eats a vector, does whatever, and then returns a vector. If now M is a matrix, with nothing more defined, then M\x returns the vector solution of M*y=x (by applying LU or QR factorization).

Let consider that I am able to build the matrix P, A and I would like to solve P*A*u = P*b.

using KrylovMethods.jl, I define M(x) = P*x and I feed gmres.
using IterativeSolvers.jl, I have to play around the type system to define somehow an "object" M with the operation M\x providing by P*x, and then I feed gmres.

From my point of view, it is not consistent (and the behavior of IterativeSolvers.jl appears to me confusing, because \ generally means solve but in this context, it means apply P)

cortner commented 8 years ago

@zimoun If M is an instance of a type MyPreconditioner, storing whatever information it needs, then \\(M, x) = M \ x can be defined any way you want. So the mapping x \mapsto M \ x does just the same: eat a vector and spit out a vector.

If we think of M as approximating H, then even if you are actually storing M^{-1} you want to call the application of this M^{-1} M \ x which is widely understood to mean M^{-1} * x.

lruthotto commented 8 years ago

@cortner: Thanks for starting the discussion and for your examples.

Take a look at https://github.com/lruthotto/KrylovMethods.jl/tree/devPrecond . With just adding a few lines, I was able to accommodate M as a matrix or a vector (useful for diagonal preconditioners). What do you think about these changes?

cortner commented 8 years ago

@lruthotto That is essentially what I have in mind as well. Two (three) comments:

I would not write in the documentation that M should be either a function that computes M\x or a matrix or vector such that M\x is computed; rather M should be of an arbitrary type as long as preconditioner(M, x) can be called, which returns M^{-1} x. (by whatever means). This doesn't prevent you from keeping the default behaviour you've already implemented for functions, arrays or vectors. To give you two concrete examples:
- if you look at Optim.jl/cg.jl you will see that Tim Holy implemented M^{-1} x for a diagonal preconditioner by storing M_ii^{-1} rather than M_ii. By wrapping this into a type, it can then be specified which of these the vector is understood to provide. E.g., one could provide the types DiagonalPrecon and InverseDiagonalPrecon to take care of this.
- As @zimoun pointed out, in some cases (BEM?) one actually has the inverse explicitly available, so the matrix being passed might already be the inverse. Again, by wrapping into types one can distinguish. (that said, I agree with your default behaviour!)
I would not choose preconditioner for the name of the method that applies it, because it sounds to me like you might be constructing the preconditioner. Personally I would actually stick with M \ x, but I appreciate that this seems to cause confusion. Maybe precondition, applyprecond, just precond, or something like that? Tim Holy called it precondfwd!. To be honest, I haven't thought of a really good name. In the end I am not too fussed about this.
I would probably do the default implementation in-place. A wrapper can then be provided to call whatever the user provides, e.g., precondfwd!(px, M, x) = copy!(px, precondfwd(M, x))

stevengj commented 8 years ago

For preconditioner (KSP) objects in PETSc.jl, we provide A_ldiv_B! (in-place M \ x), which corresponds to how preconditioners are defined in PETSc. Defining A_ldiv_B! for an object is sufficient to define \ as well, I believe, and it allows the solver to take advantage of in-place operations if it wants. So I would recommend this for IterativeSolvers too.

We also provide a function to do At_ldiv_B! (in-place M' \ x), but currently this is ~~missing from Base.~~ Actually, I just noticed that Base.LinAlg.At_ldiv_B! is defined.

lruthotto commented 8 years ago

@cortner: Thanks for your encouragement and suggestions. I just pushed some changes to the branch and switched more methods over to the new call to applyPrecond. I quite like it: It made the code not much longer and more flexible. Precondioners can now be provided as functions, arrays, and vectors.

What still remains is the option of left vs. right preconditioning in CG and other methods as well as preconditioning in minres (which @dpo has already implemented in his package)

cortner commented 8 years ago

Personally, I like applyPrecond, but I fear that it clashes with Julia naming conventions. Just applyprecond or apply_precond ok?

EDIT: Unless I hear otherwise I now think that the first bullet-point is crucial?! @stevengj: would you be willing to comment?

Although A_ldiv_B! hurts my eyes, this has some advantages:

it is in Base, which means that if two modules implement it then they can import from Base and don't need to know of each other. Btw, this is an issue I never quite wrapped my head around - is this discussed somewhere in depth? Can anybody comment?
it is widely understood what it means
you don't need to implement boiler-plate code for the standard operators (even though this is minimal effort)

ChrisRackauckas commented 8 years ago

Would an Algebraic Multigrid be considered for this package?

cortner commented 8 years ago

I've written a simple wrapper for pyamg which I was hoping to integrate but if you have something better, I'd be very interest d.

P.S.: I now made it public at: https://github.com/cortner/PyAMG.jl (this is really just wrapping some documentation and tests around PyAMG + PyCall)

ViralBShah commented 8 years ago

PyAMG is a pretty solid solver.

ChrisRackauckas commented 8 years ago

I think we should get a native Julia version, but PyAMG will work really well in the meantime. Good job!

cortner commented 8 years ago

Thank you - I'll be happy to register the package then.

lruthotto commented 8 years ago

Talking about native julia implementation of mulitgrid methods. @erantreister has started putting out a package here: https://github.com/JuliaInv/Multigrid.jl

We're still working on some details, but it'll be in good shape soon. Always open to feedback/suggestions.

cortner commented 8 years ago

I am now almost certain that we should use A_ldiv_B! or simply \ for application of the Preconditioner. If we do not, then we would have to create a Preconditioning.jl module that just defines the interface without anything else. This is a real short-coming of the module system, which I believe traits are supposed to fix? But by using A_ldiv_B! or \ we can simply overload the operators that are in Base, so there is no problem anymore.

I would really appreciate to get this confirmed (or contradicted) by somebody who has a deeper understanding of Julia than I do.

andreasnoack commented 8 years ago

I really don't see the point in introducing a new a special linear solve function just for preconditioners. Applying the preconditioner is a linear solve and \,A_ldiv_B! are the functions for linear solves and they can be overloaded. Regarding diagonal preconditioning then we have Diagonal in Base.LinAlg for that. If some of the abstractions in Base.LinAlg are insufficient, then please raise issues.

JuliaLinearAlgebra / IterativeSolvers.jl

Implementation roadmap #1