Overhaul conversion operator computation + fix Weighted ldiv

TSGut commented 1 year ago

@ioannisPApapadopoulos @dlfivefifty This PR fixes the issue https://github.com/JuliaApproximation/SemiclassicalOrthogonalPolynomials.jl/issues/78 with some caveats.

Roughly the issue was caused by Lanczos still being the backend for function expansion. Where applicable I have now replaced this with QR/Cholesky based expansions such that now the conversion operators are built up step by step too, i.e. if we do P\Q and P and Q are an integer parameter apart then the conversion is computed as Rn*...*R3*R2*R1 instead of the direct way. This is a lot more stable but suffers immense time loss due to the lazy typing accumulation in the lazy products, I think related to https://github.com/JuliaArrays/LazyArrays.jl/issues/255

I will keep checking whether there is a way to speed this up but for low c it isn't any slower than Lanzos and for high c Lanczos fails completely as in your issue so this may simply end up being part of the cost.

~~Note that this requires ClassicalOPs.jl v0.10.1 which has not been registered yet but is on the way.~~

TSGut commented 1 year ago

And of course there may still be bugs - still need to thoroughly test consistency.

dlfivefifty commented 1 year ago

The plan behind ApplyArray is that you can avoid these compile issues by using vectors instead of tuples, something like ApplyMatrix{T}(*, AbstractMatrix{T}[R1, …, Rm])

I haven’t actually used this functionality yet.

Note a lot of the ClassicalOPs framework should probably be doing this to mitigate the compile time headache but that’s a big project

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 93.22% and project coverage change: -1.59 :warning:

Comparison is base (f86e0d0) 93.33% compared to head (9f1135b) 91.74%.

:exclamation: Current head 9f1135b differs from pull request most recent head 2606616. Consider uploading reports for the commit 2606616 to get more accurate results

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #79 +/- ## ========================================== - Coverage 93.33% 91.74% -1.59% ========================================== Files 4 4 Lines 585 606 +21 ========================================== + Hits 546 556 +10 - Misses 39 50 +11 ``` | [Impacted Files](https://app.codecov.io/gh/JuliaApproximation/SemiclassicalOrthogonalPolynomials.jl/pull/79?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaApproximation) | Coverage Δ | | |---|---|---| | [src/SemiclassicalOrthogonalPolynomials.jl](https://app.codecov.io/gh/JuliaApproximation/SemiclassicalOrthogonalPolynomials.jl/pull/79?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaApproximation#diff-c3JjL1NlbWljbGFzc2ljYWxPcnRob2dvbmFsUG9seW5vbWlhbHMuamw=) | `87.71% <92.30%> (-3.17%)` | :arrow_down: | | [src/neg1c.jl](https://app.codecov.io/gh/JuliaApproximation/SemiclassicalOrthogonalPolynomials.jl/pull/79?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaApproximation#diff-c3JjL25lZzFjLmps) | `100.00% <100.00%> (ø)` | | | [src/twoband.jl](https://app.codecov.io/gh/JuliaApproximation/SemiclassicalOrthogonalPolynomials.jl/pull/79?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=JuliaApproximation#diff-c3JjL3R3b2JhbmQuamw=) | `95.34% <100.00%> (ø)` | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

TSGut commented 1 year ago

@ioannisPApapadopoulos @dlfivefifty ok this works. Unfortunately it takes about 10 seconds on my laptop to do

P = SemiclassicalJacobi(1.1,0,0,100)
Q = SemiclassicalJacobi(1.1,1,1,100)

L = Weighted(P) \ Weighted(Q)

As said above this is because L needs to find the expansion of the weight ratio in one of those bases and to get it it has to compute the expansion in classical jacobi and then ladder up with c=100 (or 50 I guess in QR case since it two-steps) applications of R. But at least it won't break down.

TSGut commented 1 year ago

I guess an alternative to this would be to have SemiclassicalOPs always remember their "conversion from classical" operator R. This could be done in a relatively straightforward way as we build up the hierarchy. It would make building the OPs slower but expansions would become fast. Just depends on our priorities.

MikaelSlevinsky commented 1 year ago

Regarding compute time, plan_ann2cxf computes all the truncated Jacobi matrices up to degree n by the QR's R method (with dimensions that reduce as the order increases) before extracting the Givens rotations from Q and throwing the Jacobi matrices away. So it can be used as a benchmark for raw flops vs flops + lazy OOP stuff.

This should be comparable to one of the P = or Q = lines above, I think. I have a feeling that it's much less than a second for n = 100. (Laptop is at work.)

MikaelSlevinsky commented 1 year ago

Conversion between the P & Q bases involves Choleskys of Jacobi matrices of the P family, trivially parallelizable.

What threads doing?

MikaelSlevinsky commented 1 year ago

Oh I see, I think that the last line is a conversion for two families, not between two whole hierarchies. No threads needed for that, but it's cheaper than I thought.

TSGut commented 1 year ago

The P= and Q= are fast already, negligible cost.

The entire cost is in the computation of the weigthed lowering operator (the last line) since to my knowledge it requires an expansion. Up to a constant the computation is trivial and super fast (it's just transpose of Q \ P which takes fractions of a second since it's just a cholesky) but to get the missing constant I believe I need to expand a weight function in Q (or P, I forgot which).

Expanding functions is done by expanding in the closest classical basis and converting up. Thus the whole cost comes from computing the conversion from classical to P which is c multiplications of R_j matrices. 100 lazy infinite matrix multiplications just is that slow I think even though all the individual steps are just fast cholesky.

I agree of course that it parallelizes if it's done for a hierarchy but that already works by threading broadcast.

Open to any suggestions on how to either expand in P or Q faster (can't just convert in one step, too unstable) or perhaps a different way altogether to get the constant. Maybe the constant can be given explicitly depending on the bases involved.

MikaelSlevinsky commented 1 year ago

Why do we need to compute the connection coefficients from classical to P? Can't we build the lowering operator from the Jacobi matrix of P or Q? (Don't we get the lazy Jacobi matrix from P = SemiclassicalJacobi(...)?)

TSGut commented 1 year ago

It's very possible that I am missing something obvious but the way we have been doing expansions in SemiclassicalOPs (not new code, it's been this way since day 1) is to do

(Q \ SemiclassicalJacobi(Q.t, Q.a, Q.b, 0)) *  _p0(R̃) * (R̃ \ f)

where R̃ is a normalized jacobi with Q.a and Q.b parameters. _p0(R̃) is the value of the 0th coefficient of R̃. Note that because we have the convention that SemiclassicalJacobi(Q.t, Q.a, Q.b, 0)) has 0th coefficient equal to 1 we have to account for that constant. But otherwise it is just "expand f in Jacobi, then apply the conversion operator to get to f_Q".

Now this is fast if we compute (Q \ SemiclassicalJacobi(Q.t, Q.a, Q.b, 0)) with a cholesky but if c is high which it is in John's case this is unstable. Instead we ladder up step by step, which is where the product of 100 Rs comes from. Hope that makes sense.

TSGut commented 1 year ago

Ah, you are asking why we need to expand in the first place? Well the weighted conversion operator is

k = (A \ SemiclassicalJacobiWeight(A.t,Δa,Δb,Δc))[1]
return (ApplyArray(*,Diagonal(Fill(k,∞)),(B \ A)))'

it's the computation of k which is expensive, everything else is trivial and fast. The part you are asking about I think is (B \ A)' which is indeed cheap and fast but like I said this conversion is off by a constant k (frankly I think this is our normalization choice coming back to bite us). A way to make this fast immediately is to give a way to express k explicitly - this is probably possible (?) I just haven't given it any thought yet.

TSGut commented 1 year ago

added tests to check https://github.com/JuliaApproximation/SemiclassicalOrthogonalPolynomials.jl/issues/81, which don't reproduce anymore with the changes made here

JuliaApproximation / SemiclassicalOrthogonalPolynomials.jl

Overhaul conversion operator computation + fix Weighted ldiv #79

Codecov Report