Look-ahead Lanczos Biorthogonalization

Background

This PR introduces LookAheadLanczosDecomp, an iterable which performs the Look-ahead Lanczos process. There is a companion PR #322 for Quasi-minimal residual method (QMR) for solving linear systems which uses the Krylov subspace based on this technique.

The motivation for using the look-ahead process is for use in non-hermitian linear systems, particularly ones which aren't well-conditioned, where the standard Lanczos process encounters either exact or numerical breakdown. The look-ahead technique circumvents the breakdown by constructing blocks in the sequence, such that the generated sequences avoid breakdown.

Implementation Details

We follow the coupled two-term recurrence implementation of the Lanczos process, as in (Freund, 1994). This generates a V-W and a P-Q sequence. Things are kept generic and in principle this should be a GPU-capable implemention, which as far as I can tell would be a first GPU implementation of the LAL process.

Additionally, we follow the paper's implementation by using the matrix transpose to generate the companion Krylov sequence (not the matrix hermitian conjugate).

Limited memory access and allocations

Compared to the regular Lanczos process, we generate longer (perhaps arbitrarily long) vectors which describe the recurrence relations the process uses. Therefore, the values must be stored in flexible containers. We introduce limited-memory matrices which respect the allocation of memory the user provides, and these come in three flavors:

LimitedMemoryMatrix, which stores columns (for instance the V and W sequence vectors), up to the memory limit
LimitedMemoryUpperTriangular, which stores a recurrence relation in upper triangular form
LimitedMemoryUpperHessenberg, which stores a recurrence relation in upper Hessenberg form For the latter two, access to values outside the memory range return 0. By construction, for a given step in the Lanczos sequence, we will only access the values within the memory of either of these matrices. If, however, the user requests too little memory, then the process will error. This should be rare.

The optimizations here could be pushed further - the matrices describing the recurrences are more structurally sparse than what is implied by the LimitedMemory matrices - but that would introduce more code complexity so I decided to keep it simpler. Likewise, in places we store larger vectors than we strictly need to, but this is not significant compared to the matrix-vector products.

Differences from paper

When encountering a block that is too large (above the user-supplied max_block_size), the process closes the block and continues. This is in contrast to Freund & Nachtigal, which recommend resetting the estimate of the operator norm and re-computing the block.

We also test breakdown of the D and E operators against a tolerances of eps()^(1/3). This is suggested in Freund 1993, though Freund 1994 uses plain eps(). In my experience eps() is too conservative, hence why I went with the former.

Tests

This implementation is tested against some contrived matrices and starting vectors which guarantee the generation of blocks in the look-ahead Lanczos sequence. These are then tested against identities derived in the Lanczos process. I welcome additional suggestions.

Future directions

Further micro-optimization, validation on GPU
A specialized implementation for complex-symmetric (or symmetric) matrices, which reduces by a factor of ~2x the memory usage and matrix-vector products.

References

Freund, R. W., & Nachtigal, N. M. (1994). An Implementation of the QMR Method Based on Coupled Two-Term Recurrences. SIAM Journal on Scientific Computing, 15(2), 313–337. https://doi.org/10.1137/0915022
Freund, R. W., Gutknecht, M. H., & Nachtigal, N. M. (1993). An Implementation of the Look-Ahead Lanczos Algorithm for Non-Hermitian Matrices. SIAM Journal on Scientific Computing, 14(1), 137–158. https://doi.org/10.1137/0914009

JuliaLinearAlgebra / IterativeSolvers.jl