Closed TSGut closed 11 months ago
Now also featuring the weighted conversion speedup promised to @ioannisPApapadopoulos. Hopefully this getting faster and faster eventually reaches a point where it is fast enough (I suspect now it would just be code optimization rather than better formulas tho).
Here is the example from https://github.com/JuliaApproximation/SemiclassicalOrthogonalPolynomials.jl/issues/78 reran in current implementation:
julia> using SemiclassicalOrthogonalPolynomials, ClassicalOrthogonalPolynomials
julia> t = 1.1
1.1
julia> Q₀₀ = SemiclassicalJacobi.(t, 0, 0, 0:∞);
julia> Q₁₁ = SemiclassicalJacobi.(t, 1, 1, 0:∞);
julia> L = (Weighted.(Q₀₀) .\ Weighted.(Q₁₁));
julia> @time L[20]; # before: 0.183541 seconds
0.060309 seconds (191.13 k allocations: 8.075 MiB)
julia> @time L[30]; # before: 0.396986 seconds
0.100929 seconds (402.88 k allocations: 15.625 MiB)
julia> @time L[40]; # before: 0.861748 seconds
0.212241 seconds (758.47 k allocations: 27.565 MiB)
julia> @time L[50]; # before: 1.565395 seconds
0.291458 seconds (1.26 M allocations: 43.963 MiB)
julia> @time L[60]; # before: 6.235148 seconds (with compile), 2.743544 seconds (without compile)
0.441863 seconds (1.95 M allocations: 64.999 MiB)
and we also get
julia> @time L[250]
37.124558 seconds (142.60 M allocations: 3.696 GiB, 2.71% gc time)
(((ℵ₀×ℵ₀ LinearAlgebra.LowerTriangular{Float64, LinearAlgebra.Adjoint{Float64, InfiniteLinearAlgebra.AdaptiveCholeskyFactors{Float64, BandedMatrices.BandedMatrix{Float64, Matrix{Float64}, Base.OneTo{Int64}}, LazyBandedMatrices.SymTridiagonal{Float64, LazyArrays.BroadcastVector{Float64, typeof(-), Tuple{Bool, SubArray{Float64, 1, ClassicalOrthogonalPolynomials.QRJacobiData{:Q, Float64}, Tuple{Base.Slice{InfiniteArrays.OneToInf{Int64}}, Int64}, false}}}, LazyArrays.BroadcastVector{Float64, typeof(-), Tuple{SubArray{Float64, 1, ClassicalOrthogonalPolynomials.QRJacobiData{:Q, Float64}, Tuple{Base.Slice{InfiniteArrays.OneToInf{Int64}}, Int64}, false}}}}}}} with indices OneToInf()×OneToInf()) * (ℵ₀×ℵ₀ LinearAlgebra.Diagonal{Float64, FillArrays.Fill{Float64, 1, Tuple{InfiniteArrays.OneToInf{Int64}}}} with indices OneToInf()×OneToInf()) with indices OneToInf()×OneToInf()) * ((ℵ₀×ℵ₀ LinearAlgebra.LowerTriangular{Float64, LinearAlgebra.Adjoint{Float64, InfiniteLinearAlgebra.AdaptiveCholeskyFactors{Float64, BandedMatrices.BandedMatrix{Float64, Matrix{Float64}, Base.OneTo{Int64}}, LazyBandedMatrices.SymTridiagonal{Float64, SubArray{Float64, 1, ClassicalOrthogonalPolynomials.QRJacobiData{:Q, Float64}, Tuple{Base.Slice{InfiniteArrays.OneToInf{Int64}}, Int64}, false}, SubArray{Float64, 1, ClassicalOrthogonalPolynomials.QRJacobiData{:Q, Float64}, Tuple{Base.Slice{InfiniteArrays.OneToInf{Int64}}, Int64}, false}}}}} with indices OneToInf()×OneToInf()) * (ℵ₀×ℵ₀ LinearAlgebra.Diagonal{Float64, FillArrays.Fill{Float64, 1, Tuple{InfiniteArrays.OneToInf{Int64}}}} with indices OneToInf()×OneToInf()) with indices OneToInf()×OneToInf()) with indices OneToInf()×OneToInf()) * (ℵ₀×ℵ₀ LinearAlgebra.Diagonal{Float64, FillArrays.Fill{Float64, 1, Tuple{InfiniteArrays.OneToInf{Int64}}}} with indices OneToInf()×OneToInf()) with indices OneToInf()×OneToInf():
0.00434421 ⋅ ⋅ ⋅ …
0.00428913 0.00608072 ⋅ ⋅
-3.76572e-5 0.0059776 0.00737181 ⋅
⋅ -7.88228e-5 0.00721544 0.00842674
⋅ ⋅ -0.000127024 0.00821237
⋅ ⋅ ⋅ -0.000180951 …
⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅
⋅ ⋅ ⋅ ⋅
⋮ ⋱
Granted that is not exactly fast but there are a LOT of cholesky decompositions that need to be done to construct this. I am sure there is a lot of code level optimizations left here but let's see if this is good enough first.
All modified and coverable lines are covered by tests :white_check_mark:
Comparison is base (
82bac69
) 91.66% compared to head (a26d581
) 92.40%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Ok I think this does what I wanted it to now, can be merged.
@ioannisPApapadopoulos give it a try and see if this fixes the issue you saw with the normalization. You should precompute the normalization constants if possible so it can all be done in one go of the recurrence.
Brilliant, thanks @TSGut. I'll give it a spin and report back.
I don't really understand what L[250]
is computing. Why is it quadratic in cost? I would think computing L[m][1:n,1:n]
would be O(m*n)
operations
I don't really understand what
L[250]
is computing. Why is it quadratic in cost? I would think computingL[m][1:n,1:n]
would beO(m*n)
operations
A fair question. In theory, the way this is computed is by chaining lots of QR or Cholesky decompositions to ladder up. But the cost is not straightforward because this laddering involves some cached arrays meaning that
julia> @time L[250];
32.742888 seconds (140.99 M allocations: 3.648 GiB, 2.50% gc time)
julia> @time L[250];
16.266950 seconds (72.01 M allocations: 1.871 GiB, 2.54% gc time)
So at least half of the computing time here is spent filling in some cached jacobimatrices to then perform cholesky decomposition on.
I think I know where most of the computation time is lost, possibly what causes the N^2 cost. I will see if I can fix it.
hm, ok it is faster now, especially after caching which is nice but still has not changed the complexity.
julia> @time LL[250];
25.778077 seconds (106.66 M allocations: 2.762 GiB, 2.96% gc time, 0.09% compilation time)
julia> @time LL[250];
8.797171 seconds (36.06 M allocations: 959.496 MiB, 2.45% gc time)
Ok this is more like it...
julia> @time L[250];
16.329566 seconds (70.79 M allocations: 1.829 GiB, 2.53% gc time)
julia> @time L[250];
0.000901 seconds (835 allocations: 39.078 KiB)
(no, I didnt just cache this object, but it only works with objects that are cached now instead of erroneously regenerating them over and over) Importantly it precomputes all the stuff needed for the other conversions too on the way. so all the complexity left is filling the cached jacobimatrices up to the required order, it is otherwise blazingly fast.
julia> @time L[250];
0.000896 seconds (835 allocations: 39.078 KiB)
julia> @time L[249];
0.074969 seconds (309.57 k allocations: 7.869 MiB)
julia> @time L[249];
0.000951 seconds (835 allocations: 39.078 KiB)
@ioannisPApapadopoulos you should keep this in mind when using this. Probably don't call L[1:250], call L[250] on its own first, then call L[1:250] otherwise Julias ordering may not make use of this and you will waste time.
If it’s AbstractCachedVector
then L[1:250]
should be fine
retagged as WIP for now because of a potential forward recurrence instability. see the relevant issue in the draft repo
Now includes sumquotient()
used in weighted conversions to avoid the t^c
term overflow for high c
in computations of sum(wP)/sum(wQ)
by cancelling them before using the recurrence.
While there are some drawbacks here that we discussed, I think we probably still merge this in this form (after resolving conflicts)? The previous implementation is pretty much strictly worse so we can use the recurrence until we revamp the decompositions stuff to keep track of the normalization directly. It also would still want to have the recurrence anyway for cases where we don't want to generate the hierarchy of polynomials.
I think there is a bug when c=0
. e.g.
using SemiclassicalOrthogonalPolynomials
import HypergeometricFunctions: _₂F₁general2
using SpecialFunctions
t, a, b, c = 1.1, 1, 1, 0
sum.(SemiclassicalJacobiWeight.(t,a,b,c:c)) # returns 1.0
t^c * beta(1+a,1+b) * _₂F₁general2(1+a,-c,2+a+b,1/t) # returns 0.16666666666666669
Yep, that's definitely a bug. Will fix.
@TSGut have you made a start? or should I have a go fixing it next week?
Ah, sorry I just didn't push. It was just a bug in the special casing for the c=0.
@dlfivefifty ready for review or merge
Thank you!
Generally you would resolve the conflicts and have tests passing before asking someone to review or merge
Generally you would resolve the conflicts and have tests passing before asking someone to review or merge
Of course. I just missed that there was a conflict, sorry about that. Should be fixed.
@dlfivefifty Ok, all passing and conflicts and review resolved.
Ok other than the recurrence array change I have addressed the reviews. I will check how much work it would be to convert this all to use RecurrenceArrays in the coming few days.
@ioannisPApapadopoulos @dlfivefifty
Basically just porting over @MikaelSlevinsky's contiguous recurrence implementation in a way that should be reasonably natural for this package. This should be both much faster and hopefully much more accurate at high c.
Previously we had
Now we get
At no cost to accuracy (the old method using bigfloat, the new one just Float64):
It's implemented to work whenever sum is broadcast over a crange vector of weights as seen above, as it is it wont be used if you just call sum on a single weight (mainly because doing that in a clean way would involve a lot of clauses about a, b and c and other contiguous recurrences), maybe we implement that at some point in the future. If you must use it on a single weight just do e.g.
sum.(SemiclassicalJacobiWeight.(1.1,2,3,3:3))
.Still need to add and run tests.