How to make rotationally invariant solves fast?

dlfivefifty commented 2 years ago

Rotationally invariant operators are typically represented by ModalInterlace which combines matrices acting only on Fourier modes into a diagonal-block-banded matrix (that is, block banded matrix with diagonal blocks, i.e. subblockbandwidths are (0,0)).

At the moment we aren't taking advantage of this structure. There are two approaches:

Do it at the level of factorizing a BandedBlockBandedMatrix by checking whether the subblockbandwidths are (0,0). The easiest way would be to copy to a banded matrix, do a QR, and then copy the data back.
Introduce a DiagonalBlockBandedMatrix though this doesn't seem to have any obvious benefits over (1).
Do it directly on a ModalInterlace. This has the benefit of being able to call qr on each of the banded operators separately (and potentially taking advantage of @threads). Though probably (1) could also be parallelised.

@TSGut @ioannisPApapadopoulos any thoughts? Do you think this is important right now? It won't help with the variable coefficients problem but could be important for fractional DEs and time-stepping.

ioannisPApapadopoulos commented 2 years ago

To understand properly, this discussion is entirely about how to do the solve of a diagonally banded block banded matrix? What's the current way things are solved? QR on the whole matrix?

dlfivefifty commented 2 years ago

For general banded-block-banded matrix it just reduces to block-banded QR, that is, it treats the blocks as dense. This is because there is invariably fill-in. This is an O(N^2) = O(n^4) algorithm (where N is total degrees of freedom, n is the number of blocks). (In theory it can be reduced to O(N^(3/2)) using hierarchical methods but this may be unstable.)

At the moment diagonal-block-banded is handled the same way. Though using any of the proposals above it reduces to O(N). Also it should be easy to multithread (though we'll see whether my banded QR is actually multithread safe....)

ioannisPApapadopoulos commented 2 years ago

ok! And in option (1), each individual block would be copied to a banded matrix and then a QR factorization is applied? So there would be n QR factorizations in both (1) and (3), where n is the number of blocks?

I guess te advantage of (1) is that it can be applied to any problem that induces a diagonal banded block banded matrix. Rather than constraining ourselves to the ModalInterlace case.

TSGut commented 2 years ago

I suppose to answer this I'd need to have a better understanding of howModalInterlace is meant to be used - the tests are a bit sparse, any more natural examples?

Lacking this, I'd also lean towards (1).

dlfivefifty commented 2 years ago

I guess te advantage of (1) is that it can be applied to any problem that induces a diagonal banded block banded matrix. Rather than constraining ourselves to the ModalInterlace case.

Exactly. The negative is you take banded matrices with simple rational formulae and nice memory sequential properties, rearrange as a BandedBlockBandedMatrix, and then break apart by copying. Whether this extra rearranging has a significant impact on performance is not clear to me.

JuliaApproximation / MultivariateOrthogonalPolynomials.jl

How to make rotationally invariant solves fast? #126