Krylov methods expecting factorized preconditioners

pratikvn commented 6 years ago

Some Krylov methods, such as the Quasi-Minimal Residual (templates, page 22), are designed for preconditioners of the form M = U * V. The implementation of the method assumes that the preconditioner can be decomposed and U and V factors applied separately.

How should we implement this in Ginkgo? Currently, the only assumption is that preconditioners can be applied to a vector (there's nothing about decomposing them).

gflegar commented 6 years ago

As I see it, it's always possible to decompose every preconditioner as M = I * M (U = I, V = M). I'm not an expert in this (any experts here that can clarify this?), but it seems that if it was that simple no one would bother splitting the preconditioner into two parts. My guess is that splitting it somehow increases the numerical stability of the method? If this is the case is there a rule of thumb on how this split should be made? Something like k(U) ~= k(V) seems reasonable to me?

One observation I made is that if we have a good left preconditioner M for A (M^-1 A ~= I), then by splitting it as M = U * V, U and V are a good left and right preconditioner pair for A:

U^-1 A V^-1 = V V^-1 U^-1 A V^-1 = V M^-1 A V^-1 ~= V I V^-1 = I    (1)

An important question which I feel dictates further library design is which system is the Ktylov method actually solving? Is it just a normal left-preconditioned system (V^-1 U^-1 A) x = (V^-1 U^-1 b) (2) or is the method using equation (1) to solve (U^-1 A V^-1) (V x) = (U^-1 b) (3) instead?

If it's solving (2) then we need a way to mark that some preconditioners are decomposable into two factors (we could implement a Decomposable interface which implements methods for obtaining the first and second factor of the preconditioner).

If it's solving (3) than the decomposition is not at all a problem of the preconditioner, and the preconditioner doesn't have to know anything about it. It is just a case of doing both left and right preconditioning at the same time, and the Krylov method should just accept two linear operators as input (they can both default to I).

drittelhacker commented 6 years ago

This may be a stupid uninformed comment, but: Are you guys planing to support matrix valued preconditioners only? In principle a preconditioner just needs to be a linear map (usually depicted by a capital letter suggesting a matrix), so if the preconditioner is actually some sort of algebraic/geometric multigrid cycle, or another Krylov method a decomposition M=U * V may be rather non-trivial.

gflegar commented 6 years ago

@drittelhacker thanks for joining the discussion :smile:

We already support any linear operator as a preconditioner. (Note that set_preconditioner() takes any LinOp as input.) You can also see an example in the unit tests where we construct a CG solver and set another CG solver fixed to 3 iterations as a preconditioner (putting the usefulness of such a construct aside, this just allowed us to use the least amount of different linear operators in the test).

This is exactly why we have started this discussion - we cannot just assume all linear operators can be decomposed (ofc, one can always decompose any operator as M = I * M), as this puts extra strain to everyone implementing a new LinOp to also define this decomposition method, even though it might not be meaningful for their LinOp.

I personally think (and hope) that (3) is the semantics we are looking for, as this doesn't complicate the design of the LinOp, nor the design of the Krylov solver. If (2) is what better describes the semantics of the solver, then I guess a possible implementation would be to first check if the passed-in LinOp implements the Decomposable interface, and decompose it if it does, and otherwise fall back to the default decomposition M = I * M. This way we only complicate the implementation of the solver, but not the design of the LinOp.

drittelhacker commented 6 years ago

+1 for (3), but you may want to have a compatibility check/flag for the U and V, i.e. check that the outermost equality in (1) actually holds for the choice made.

hartwiganzt commented 6 years ago

I feel we mix two discussions here: One is about the type of preconditioning we support, and one is about whether a preconditioner can be applied as single or two operators.

For the first, we have to decide between left preconditioning ( MAx = Mb ) Right preconditioning ( AMx =c, x = My) and mixed preconditioning. Matlab only supports left preconditioning (afaik), and the same holds for MAGMA-sparse (that I should know.) Is it valid to stick with that? I am not sure what Trilinos/SuperLU/PetSC does. I would appreciate input from the community.

The second question is whether we want to have the possibility to split a preconditioner M into two operators. Obviously, this is possible for all preconditioners, if one split becomes the identity. But this generates overhead. Matlab has always two application stages, this simplifies code. We could do the same and accept the overhead of an additional vector copy for non-split preconditioners. This would imply every solver actually takes two linear operators as preconditioners (and if only one is specified, the other one being the identity, by default) Also here I would appreciate input from the community.

mhoemmen commented 6 years ago

@hartwiganzt is right -- we should not conflate what operators an iterative solver accepts, with what a particular preconditioner can do.

Iterative solvers may

choose to accept left, right, or both (split) preconditioners.
have this option without overhead, if the solver implementation dispatches to different algorithms based on whether or not the (left,right,both) preconditioner(s) exist(s).

Regarding preconditioners:

A particular preconditioner may choose to present a split interface.
The general preconditioner "apply" interface should not be required to "split" the preconditioner.
Most users want right preconditioning because it preserves the residual norm (e.g., makes GMRES' "implicit" residual norm the same as its "explicit" residual norm norm(b-A*x, 2)).
However, some users want left preconditioning (optimization folks, for example -- I can fish around for details if you're interested).

egboman commented 6 years ago

I know some people who use left preconditioning and others who use right. Mathematically they give different results, so I think both options should be supported. However, split/both is much less common, so lower priority though it is the most general approach.

gflegar commented 6 years ago

Simple question I would like to know the answer to before we proceed: Why does the QMR algorithm (templates, page 22) have 2 preconditioners (M1 and M2)? Is it because it's written to support mixed preconditioning (i.e. M1 is applied from the left and M2 from the right)?

hartwiganzt commented 6 years ago

No, it is left preconditioning. Only, it also needs the transpose preconditioner. This can be realized by avoiding the transposition, and instead swap upper and lower triangular solves and do them with the transposed triangular factor.

gflegar commented 6 years ago

I'm not sure I follow... I know it also needs the transpose, but don't see why they would need to split the preconditioner into two parts for that (why not just write M^T?).

Also, the following line:

If one uses right (or post) preconditioning, that is M1 = I, then a cheap upper bound for ‖r(i)‖ can be computed in each iteration, avoiding the recursions for r(i)

leads me to assume that M1 is in fact the left preconditioner, and M2 the right one.

egboman commented 6 years ago

I don't think QMR needs split preconditioner, but in the text it does say M = M1 * M2. Perhaps the author of that section wanted to do this, not clear to me why it's different from other Krylov methods...

egboman commented 6 years ago

The Matlab version allows either single preconditioner or split peconditioner: https://www.mathworks.com/help/matlab/ref/qmr.html

gflegar commented 6 years ago

I found the original QMR paper here. Section 7 (page 22) says they are applying a preconditioner M = M1 * M2 to the equivalent system with system matrix A' = M1^-1 * A * M2^-1.

Looks like split preconditioning to me...

I guess that templates just kept the original notation, and didn't explain the need for M1 and M2 very well.

hartwiganzt commented 6 years ago

I can only say what matlab and MAGMA-sparse does - and that is left preconditioning.

mhoemmen commented 6 years ago

We talked a little bit about LSQR over the phone today. Matlab actually right-preconditions LSQR:

https://www.mathworks.com/help/matlab/ref/lsqr.html?s_tid=gn_loc_drop

That makes sense because LSQR solves a (constrained) least-squares problem. A "left preconditioner" would actually change the function to minimize. (That's true for GMRES as well, but GMRES aims to solve the linear system and a good solution to the linear system will minimize that function regardless. LSQR works even when there is no x such that Ax=b.)

Our users tend to prefer right preconditioners for GMRES etc. for this reason. However, it's not hard to make GMRES, etc. achieve the tolerance for the actual ("explicit") residual norm(b-A*x,2); first iterate until the "implicit residual" (the thing that GMRES computes from the little upper Hessenberg system) achieves the desired tolerance, than iterate some more until norm(b-A*x,2) does as well. Thus, it's not much different to use a left preconditioner.

gflegar commented 6 years ago

After comparing the algorithm in templates and the original QMR paper with the algorithm implemented in MAGMA-sparse (this one should follow the implementation from MATLAB), @hartwiganzt and I figured out that they are using the preconditioner splitting in a different way:

Templates and QMR paper use two-sided preconditioning and solve a system with matrix A' = U^-1 A V^-1.

MAGMA-sparse uses preconditioning with system matrix A' = V^-1 U^-1 A. Basically using M = UV as the left preconditioner. The reason for apply_left and apply_right in MAGMA-sparse is due to clumsy design, and is not needed in Ginkgo.

Thus, to answer @pratikvn's question: a pair of apply_left and apply_right that you see in MAGMA-sparse should just be replaced with a single call to apply in Ginkgo.

There remains another open question: do we want to support both left and right preconditioning? Not sure I'm the person who should answer that, but if we support both left and right, we can also support two-sided preconditioning without significant effort.

hartwiganzt commented 6 years ago

If we have only one "apply" and do not distinguish between left and right, we need to compute two factorizations (for A and A^T). Otherwise, we can transpose the factors.

Also, I guess that other libraries interfacing to Ginkgo may want two preconditioner application routines.

gflegar commented 6 years ago

We can do the same with one apply. We can discuss it offline.

On Sat, Mar 10, 2018, 15:46 Hartwig Anzt notifications@github.com wrote:

If we have only one "apply" and do not distinguish between left and right, we need to compute two factorizations (for A and A^T). Otherwise, we can transpose the factors.

Also, I guess that other libraries interfacing to Ginkgo may want two preconditioner application routines.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/ginkgo-project/ginkgo/issues/17#issuecomment-372008044, or mute the thread https://github.com/notifications/unsubscribe-auth/ABNAj5XHzNwtcTX2p6rtDISt1o9Qy-D1ks5tc3bhgaJpZM4STZeI .

gflegar commented 6 years ago

I feel that the original question of how to handle the factorized preconditioner problem, which MAGMA-sparse solved by having apply_left and apply_right has been answered:

We replace apply_left + apply_right sequence with just a single apply in Ginkgo.

Related issues #26 and #27 deal with the issue of implementing factorization-based preconditioners so they behave correctly with a single call to apply, and without additional overhead. Thus, I'm closing this issue.

Mostly thanks to confusion caused by different notations used by MATLAB (and MAGMA-sparse) and the templates book, we started a discussion on whether or not we should support different preconditioning options: left, right, two-sided. Since this is not a topic of this thread, I've extracted all related comments and started issue #28. Please use that thread to continue the discussion.

ginkgo-project / ginkgo

Krylov methods expecting factorized preconditioners #17