Lowering in weighted vs. unweighted

ioannisPApapadopoulos commented 1 year ago

It appears that weighted raising has a different backend (that fails for larger c) than unweighted lowering even though one is the transpose of the other modulo a constant.

E.g.

using ClassicalOrthogonalPolynomials, SemiclassicalOrthogonalPolynomials
P = SemiclassicalJacobi(1.1,0,0,10)
Q = SemiclassicalJacobi(1.1,1,1,10)

R = Q \ P
L = Weighted(P) \ Weighted(Q)

R./L' # same constant in all entries

# Try higher c parameter
P = SemiclassicalJacobi(1.1,0,0,100)
Q = SemiclassicalJacobi(1.1,1,1,100)

R = Q \ P # works no problem
L = Weighted(P) \ Weighted(Q) # fails with DomainError

TSGut commented 1 year ago

I never touched Weighted(P) \ Weighted(Q) so that'll be why, it probably still uses Lanczos. I will check where it goes wrong.

The constant is probably 1/L[1,1].

ioannisPApapadopoulos commented 1 year ago

Thank you @TSGut 😄

TSGut commented 1 year ago

Just to sanity check our syntax, isn't this weighted lowering and unweighted raising?

ioannisPApapadopoulos commented 1 year ago

Yeah I was actually going to ask. You are probably right!

TSGut commented 1 year ago

Thinking about how the \ syntax should work I think what I said is right but as a separate check: unweighted raising is banded in classical OPs and

julia> P = Legendre()
Legendre()

julia> Q = Jacobi(1,1)
Jacobi(1.0, 1.0)

julia> R = Q \ P
(ℵ₀×ℵ₀ LazyBandedMatrices.Bidiagonal{Float64, BroadcastVector{Float64, typeof(/), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}, BroadcastVector{Float64, typeof(/), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}} with indices OneToInf()×OneToInf()) * (ℵ₀×ℵ₀ LazyBandedMatrices.Bidiagonal{Float64, BroadcastVector{Float64, typeof(/), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}, BroadcastVector{Float64, typeof(/), Tuple{InfiniteArrays.InfStepRange{Float64, Float64}, InfiniteArrays.InfStepRange{Float64, Float64}}}} with indices OneToInf()×OneToInf()) with indices OneToInf()×OneToInf():
 1.0  0.0  -0.2            ⋅          ⋅          ⋅          ⋅             ⋅        …  
  ⋅   0.5  -5.55112e-17  -0.214286    ⋅          ⋅          ⋅             ⋅           
  ⋅    ⋅    0.4           0.0       -0.222222    ⋅          ⋅             ⋅           
  ⋅    ⋅     ⋅            0.357143   0.0       -0.227273    ⋅             ⋅           
  ⋅    ⋅     ⋅             ⋅         0.333333   0.0       -0.230769       ⋅           
  ⋅    ⋅     ⋅             ⋅          ⋅         0.318182  -5.55112e-17  -0.233333  …  
  ⋅    ⋅     ⋅             ⋅          ⋅          ⋅         0.307692      0.0          
  ⋅    ⋅     ⋅             ⋅          ⋅          ⋅          ⋅            0.3          
  ⋅    ⋅     ⋅             ⋅          ⋅          ⋅          ⋅             ⋅           
 ⋮                                              ⋮                                  ⋱

I found the reason it fails btw, it uses some iterative Christoffel Darboux stuff to compute \ for weighted but that is really unstable for some reason. I will replace the straightforward case where the weights all match the basis with an OP decomposition approach.

TSGut commented 1 year ago

The original issue was resolved but I think we keep this open for now because of the chosen solution being slow and almost certainly substantially improveable.

ioannisPApapadopoulos commented 1 year ago

@TSGut Unfortunately I think the fix has introduced a problem with the lazy representation. Trying to do multiplication or extracting a column now hangs.

ioannisPApapadopoulos commented 1 year ago

e.g.

using SemiclassicalOrthogonalPolynomials, ClassicalOrthogonalPolynomials
t = 1.1
m = 0
Q₀₀ = SemiclassicalJacobi(t, 0, 0, m)
Q₁₁ = SemiclassicalJacobi(t, 1, 1, m)

L = (Weighted(Q₀₀) \ Weighted(Q₁₁))
L' * L # hangs
L[:,1] # hangs

TSGut commented 1 year ago

Hm, that sounds like an issue with LazyArrays or ApplyArray more generally rather than this implementation since L is just computed as a constant times an ApplyArray, there isn't anything special going on. So unfortunately I don't think this is something to be fixed in this package.

However as always with these lazy things, explicitly calling

ApplyArray(*,L',L)

instead of L' * L will work and

view(L,:,1)

will give you the entries L[:,1].

@dlfivefifty do you have a better idea for fixing this than just stepping through debugger and replacing whatever call goes wrong with an ApplyArray(, ...) call?

ioannisPApapadopoulos commented 1 year ago

amazing, thanks Timon!

dlfivefifty commented 1 year ago

What are the MemoryLayouts of the arguments?

TSGut commented 1 year ago

julia> MemoryLayout(L)
LazyArrays.ApplyLayout{typeof(*)}()

julia> MemoryLayout(L')
LazyArrays.ApplyLayout{typeof(*)}()

dlfivefifty commented 1 year ago

Ok and the arguments?

TSGut commented 1 year ago

The arguments inside of the ApplyArray you mean?

julia> MemoryLayout.(L.args)
(LazyArrays.ApplyLayout{typeof(*)}(), DiagonalLayout{FillLayout}())

and then we enter a chain of multiplications, i.e.

julia> MemoryLayout.(L.args[1].args)
(LazyArrays.ApplyLayout{typeof(*)}(), LazyArrays.ApplyLayout{typeof(*)}())

Basically this thing is (D*Rn*....*R3*R2*R1)'. where the R are UpperTriangular and the D is a diagonal matrix.

TSGut commented 1 year ago

Maybe I also add the final bit of the chain for John's example:


julia> MemoryLayout.(L.args[1].args[1].args)
(TriangularLayout{'L', 'N', InfiniteLinearAlgebra.AdaptiveLayout{BandedMatrices.BandedRows{DenseColumnMajor}}}(), DiagonalLayout{FillLayout}())

julia> MemoryLayout.(L.args[1].args[2].args)
(TriangularLayout{'L', 'N', InfiniteLinearAlgebra.AdaptiveLayout{BandedMatrices.BandedRows{DenseColumnMajor}}}(), DiagonalLayout{FillLayout}())

TSGut commented 1 year ago

So I guess more accurately the structure is D1*(D2*R2)*(D3*R3) in this instance. There is also a transpose around the whole thing which is why it's saying they are lower triangular and reversing the multiplication order.

dlfivefifty commented 1 year ago

The problem is it doesn't know to do Mul{TriangularLayout{'L', 'N', InfiniteLinearAlgebra.AdaptiveLayout{BandedMatrices.BandedRows{DenseColumnMajor}}}, DiagonalLayout{FillLayout}} lazily. Note

julia> InfiniteLinearAlgebra.AdaptiveLayout |> supertype
MemoryLayout

So it might just be that we need to make AdaptiveLayout <: AbstractLazyLayout

dlfivefifty commented 1 year ago

It’s a nasty one

dlfivefifty commented 1 year ago

I've fixed it (wrapped layouts like TriangularLayout{uplo, unit, LazyLayout}, and SymmetricLayout{LazyLayout} weren't being treated as lazy), just need to run tests and tag.

TSGut commented 1 year ago

@ioannisPApapadopoulos I think we can close this thanks to the last update I made + recent major improvements in other packages by Sheehan and Tianyi?

ioannisPApapadopoulos commented 1 year ago

Apologies for the slow reply. I've noticed a big drop in performance in terms of accessing the lowering matrices. For instance:

using SemiclassicalOrthogonalPolynomials, ClassicalOrthogonalPolynomials
t = 1.1
Q₀₀ = SemiclassicalJacobi.(t, 0, 0, 0:∞);
Q₁₁ = SemiclassicalJacobi.(t, 1, 1, 0:∞);
L  = (Weighted.(Q₀₀) .\ Weighted.(Q₁₁));
@time L[20]; # 0.183541 seconds
@time L[30]; # 0.396986 seconds
@time L[40]; # 0.861748 seconds
@time L[50]; # 1.565395 seconds
@time L[60]; # 6.235148 seconds (with compile), 2.743544 seconds (without compile)

which is making everything very slow because I need to grab the matrices very often. Jacobi matrices are much faster in comparison:

X = jacobimatrix.(Q₁₁)
R = (X .- X .* X)/t^2;
@time R[20]; # 0.000011 seconds
@time R[30]; # 0.000012 seconds
@time R[40]; # 0.000009 seconds

Does anyone see a way around this?

TSGut commented 1 year ago

This is basically what was discussed above. There are two components slowing this down, one is the repeat lazy multiplication which I think for now is not worth worrying about. The other big slowdown is caused by this line

https://github.com/JuliaApproximation/SemiclassicalOrthogonalPolynomials.jl/blob/60fe8288c6786ce35822fc2d0e83213317508dbf/src/SemiclassicalOrthogonalPolynomials.jl#L352C10-L352C10

Basically, as you said in the opening post of this issue, the operators are easy to obtain except for a constant. Getting said constant however is what that above line does and it is an expensive expansion. If we had a way to get this constant without an expansion the whole thing would be much faster. The constant is basically due to the fact that we choose the normalization where $p_0 = 1$. There is almost certainly a way around computing this constant via expansion...

EDITS:

~~Actually nevermind, your issue is not related to the constant since it also slows down with compilation (I missed that part). Let me have another look...~~ Double nevermind, the slowdown is the two factors I mentioned above.

I don't know how useful this is for you but since this is an object comprised of cached things, have you tried calling the highest element you need first just to fill the hierarchy? Then all future calls should be faster and don't require the whole extension machinery. (if it is not properly cached that may be something worth looking into)

TSGut commented 1 year ago

Is this maybe an issue on your end? Perhaps you are not up to date on all packages? I get this:

julia> @time L[20]; # 0.183541 seconds
  0.001399 seconds (1.62 k allocations: 76.219 KiB)

julia> @time L[30]; # 0.396986 seconds
  0.013516 seconds (152.80 k allocations: 5.564 MiB)

julia> @time L[40]; # 0.861748 seconds
  0.022915 seconds (255.52 k allocations: 8.656 MiB)

julia> @time L[50]; # 1.565395 seconds
  0.033797 seconds (382.33 k allocations: 12.405 MiB)

which seems fine?

Though to be fair I get an error higher up

 L[60]
ERROR: DomainError with -7.079144436318238:
sqrt will only return a complex result if called with a complex argument. Try sqrt(Complex(x)).
Stacktrace:
  [1] throw_complex_domainerror(f::Symbol, x::Float64)
    @ Base.Math ./math.jl:33
  [2] sqrt
    @ ./math.jl:677 [inlined]
  [3] OrthogonalPolynomialRatio{Float64, SemiclassicalJacobi{Float64}}(P::SemiclassicalJacobi{Float64}, x::Float64)
    @ ClassicalOrthogonalPolynomials ~/.julia/packages/ClassicalOrthogonalPolynomials/KuMrz/src/ratios.jl:14
  [4] OrthogonalPolynomialRatio
    @ ~/.julia/packages/ClassicalOrthogonalPolynomials/KuMrz/src/ratios.jl:19 [inlined]
  [5] RaisedOP
    @ ~/Documents/Projects/SemiclassicalOrthogonalPolynomials.jl/src/SemiclassicalOrthogonalPolynomials.jl:91 [inlined]
  [6] op_lowering(Q::SemiclassicalJacobi{Float64}, y::Int64)
    @ SemiclassicalOrthogonalPolynomials ~/Documents/Projects/SemiclassicalOrthogonalPolynomials.jl/src/SemiclassicalOrthogonalPolynomials.jl:240
  [7] \(w_A::QuasiArrays.BroadcastQuasiMatrix{Float64, typeof(*), Tuple{SemiclassicalJacobiWeight{Float64}, SemiclassicalJacobi{Float64}}}, w_B::QuasiArrays.BroadcastQuasiMatrix{Float64, typeof(*), Tuple{SemiclassicalJacobiWeight{Float64}, SemiclassicalJacobi{Float64}}})
    @ SemiclassicalOrthogonalPolynomials ~/Documents/Projects/SemiclassicalOrthogonalPolynomials.jl/src/SemiclassicalOrthogonalPolynomials.jl:354
  [8] \(w_A::QuasiArrays.BroadcastQuasiMatrix{Float64, typeof(*), Tuple{SemiclassicalJacobiWeight{Float64}, SemiclassicalJacobi{Float64}}}, w_B::QuasiArrays.BroadcastQuasiMatrix{Float64, typeof(*), Tuple{SemiclassicalJacobiWeight{Float64}, SemiclassicalJacobi{Float64}}})
    @ SemiclassicalOrthogonalPolynomials ~/Documents/Projects/SemiclassicalOrthogonalPolynomials.jl/src/SemiclassicalOrthogonalPolynomials.jl:361
  [9] \
    @ ~/Documents/Projects/SemiclassicalOrthogonalPolynomials.jl/src/SemiclassicalOrthogonalPolynomials.jl:379 [inlined]
 [10] _broadcast_getindex_evalf
    @ ./broadcast.jl:683 [inlined]
 [11] _broadcast_getindex
    @ ./broadcast.jl:656 [inlined]
 [12] getindex
    @ ./broadcast.jl:610 [inlined]
 [13] getindex(A::LazyArrays.BroadcastVector{Any, typeof(\), Tuple{LazyArrays.BroadcastVector{Weighted{Float64, SemiclassicalJacobi{Float64}}, Type{Weighted}, Tuple{SemiclassicalOrthogonalPolynomials.SemiclassicalJacobiFamily{Float64, Int64, Int64, InfiniteArrays.InfUnitRange{Int64}}}}, LazyArrays.BroadcastVector{Weighted{Float64, SemiclassicalJacobi{Float64}}, Type{Weighted}, Tuple{SemiclassicalOrthogonalPolynomials.SemiclassicalJacobiFamily{Float64, Int64, Int64, InfiniteArrays.InfUnitRange{Int64}}}}}}, kj::Int64)
    @ LazyArrays ~/.julia/packages/LazyArrays/oNvoa/src/lazybroadcasting.jl:88

EDIT: Something weird has happened here since I last worked on this issue. Some things are falling back to Lanczos when they didn't in my last tests. I will try to track this down.

TSGut commented 1 year ago

Okay I have managed to reproduce the behavior you see now, I will check whether we can do something about it.

I think there is nothing obvious we can do without new theoretical insights about speeding up the initial construction of the operators unless via the two paths mentioned above. That said, there is definitely wasted time here and a lot of it if you repeatedly call these things. First and foremost, the operator is regenerated when you call the element instead of just handing you what was already computed.

TSGut commented 1 year ago

Consider this slightly silly workaround if you know the max degree you will need:

julia> LL = L[1:50]
julia> @time LL[50]
  0.000004 seconds
(((ℵ₀×ℵ₀ LinearAlgebra.LowerTriangular{Float64, LinearAlgebra.Adjoint{Float64, InfiniteLinearAlgebra.AdaptiveCholeskyFactors{Float64, BandedMatrices.BandedMatrix{Float64, Matrix{Float64}, Base.OneTo{Int64}}, LazyBandedMatrices.SymTridiagonal{Float64, BroadcastVector{Float64, typeof(-), Tuple{Bool, SubArray{Float64, 1, ClassicalOrthogonalPolynomials.QRJacobiData{:Q, Float64}, Tuple{Base.Slice{InfiniteArrays.OneToInf{Int64}}, Int64}, false}}}, BroadcastVector{Float64, typeof(-), Tuple{SubArray{Float64, 1, ClassicalOrthogonalPolynomials.QRJacobiData{:Q, Float64}, Tuple{Base.Slice{InfiniteArrays.OneToInf{Int64}}, Int64}, false}}}}}}} with indices OneToInf()×OneToInf()) * (ℵ₀×ℵ₀ LinearAlgebra.Diagonal{Float64, FillArrays.Fill{Float64, 1, Tuple{InfiniteArrays.OneToInf{Int64}}}} with indices OneToInf()×OneToInf()) with indices OneToInf()×OneToInf()) * ((ℵ₀×ℵ₀ LinearAlgebra.LowerTriangular{Float64, LinearAlgebra.Adjoint{Float64, InfiniteLinearAlgebra.AdaptiveCholeskyFactors{Float64, BandedMatrices.BandedMatrix{Float64, Matrix{Float64}, Base.OneTo{Int64}}, LazyBandedMatrices.SymTridiagonal{Float64, SubArray{Float64, 1, ClassicalOrthogonalPolynomials.QRJacobiData{:Q, Float64}, Tuple{Base.Slice{InfiniteArrays.OneToInf{Int64}}, Int64}, false}, SubArray{Float64, 1, ClassicalOrthogonalPolynomials.QRJacobiData{:Q, Float64}, Tuple{Base.Slice{InfiniteArrays.OneToInf{Int64}}, Int64}, false}}}}} with indices OneToInf()×OneToInf()) * (ℵ₀×ℵ₀ LinearAlgebra.Diagonal{Float64, FillArrays.Fill{Float64, 1, Tuple{InfiniteArrays.OneToInf{Int64}}}} with indices OneToInf()×OneToInf()) with indices OneToInf()×OneToInf()) with indices OneToInf()×OneToInf()) * (ℵ₀×ℵ₀ LinearAlgebra.Diagonal{Float64, FillArrays.Fill{Float64, 1, Tuple{InfiniteArrays.OneToInf{Int64}}}} with indices OneToInf()×OneToInf()) with indices OneToInf()×OneToInf():
  0.0206561      ⋅            ⋅           ⋅            ⋅            ⋅            ⋅          …  
  0.0193939     0.0278027     ⋅           ⋅            ⋅            ⋅            ⋅             
 -0.000844937   0.0255681    0.0324819    ⋅            ⋅            ⋅            ⋅             
   ⋅           -0.00168401   0.0292607   0.035853      ⋅            ⋅            ⋅             
   ⋅             ⋅          -0.0025901   0.0316399    0.0383911     ⋅            ⋅             
   ⋅             ⋅            ⋅         -0.00352926   0.0331922    0.04035       ⋅          …  
   ⋅             ⋅            ⋅           ⋅          -0.00447958   0.03418      0.0418845      
   ⋅             ⋅            ⋅           ⋅            ⋅          -0.00542704   0.0347637      
   ⋅             ⋅            ⋅           ⋅            ⋅            ⋅          -0.00636253     
  ⋮                                                                ⋮                        ⋱

That first line will take a lot of time to run since it's just computing all of them non-lazily but then calling them over and over becomes basically instant. So if you expect to call the operators over and over, you should store them.

One could argue that the L object you create should work that way out of the box but depending on what one is doing that may slow down other applications... not obvious to me that it should be the default.

Let me know if this helps, we can work something fancier out if needed

ioannisPApapadopoulos commented 1 year ago

So if you expect to call the operators over and over, you should store them.

Yes sorry I was not clear. I only call the operator once in the code and store them. But that initial call takes really long. I essentially have three different L's and even for degree n=60 I end up waiting ~10mins or more to call every matrix in the list up to c=60. So I don't think the workaround LL = L[1:50] is any good for my purposes.

https://github.com/JuliaApproximation/SemiclassicalOrthogonalPolynomials.jl/blob/60fe8288c6786ce35822fc2d0e83213317508dbf/src/SemiclassicalOrthogonalPolynomials.jl#L352C10-L352C10

Ah I see, I recently just made a change (back) in SemiclassicalJacobi where I call _₂F₁general2 in sum(P::SemiclassicalJacobiWeight{T}) with big parameters. If it's not big then you get the ERROR: DomainError with -7.079144436318238: errors. But that explains why all of a sudden my code became slower.

This seems like something RecurrenceArray could overcome...

Cheers for replying so fast @TSGut !

TSGut commented 1 year ago

Ok if the initial call itself is too slow then I can look and see how much speed I can make up elsewhere. Though in my case it doesn't take even 1 minute to do all of LL = L[1:70] - surely my laptop isn't randomly that much more powerful than yours, it's just a basic Thinkpad? I suppose it is something about the other 2 Ls you are computing which ramps the cost up. Like I said, I will see if I can find some way to speed up the code.

Can you post the ]st package versions and Julia version so I can play around in exactly your setup?

TSGut commented 1 year ago

Can confirm though that the line above with the constant computation, i.e.

k = (A \ SemiclassicalJacobiWeight(A.t,Δa,Δb,Δc))[1]

is where all the time is lost. If we can compute this constant fast then everything should be fine for you. Maybe there is an explicit form for this constant that I am not seeing right now? It's the constant term in the expansion of the weight modification.

ioannisPApapadopoulos commented 1 year ago

Here are my timings:

julia> @time LL = L[1:70];
145.780797 seconds (472.39 M allocations: 18.285 GiB, 2.52% gc time, 45.37% compilation time)

julia> @time LL = L[1:70];
 78.099827 seconds (421.22 M allocations: 15.010 GiB, 3.21% gc time)

I've put below my package versions. [Edit: Thanks for checking that's where it is slow.] If there is an explicit expression then great! Otherwise I think I have a different workaround for my specific problem. It technically has "higher" complexity but preliminary tests say it should be hundreds of times faster.

So please do not prioritise this for my sake. If this ends up being fixed in the future I can easily switch back.

  [de1797fd] AnnuliOrthogonalPolynomials v0.0.1 `https://github.com/JuliaApproximation/AnnuliOrthogonalPolynomials.jl.git#main`
  [aae01518] BandedMatrices v0.17.35
  [8e7c35d0] BlockArrays v0.16.36
  [ffab5731] BlockBandedMatrices v0.12.4
  [b30e2e7b] ClassicalOrthogonalPolynomials v0.11.1
  [7ae1f121] ContinuumArrays v0.14.1
  [5b8099bc] DomainSets v0.6.7
  [057dd010] FastTransforms v0.15.7
  [1a297f60] FillArrays v1.5.0
  [34004b35] HypergeometricFunctions v0.3.23
  [5078a376] LazyArrays v1.5.2
  [6fafb56a] Memoization v0.2.1
  [4f6956fd] MultivariateOrthogonalPolynomials v0.5.1
  [c4ea9172] QuasiArrays v0.11.0
  [291c01f3] SemiclassicalOrthogonalPolynomials v0.5.0 `C:\Users\john.papad\.julia\dev\SemiclassicalOrthogonalPolynomials`
  [276daf66] SpecialFunctions v2.3.0
  [90137ffa] StaticArrays v1.6.2
  [37e2e46d] LinearAlgebra

The dev'd SemiclassicalOrthogonalPolynomials is the same as the current main branch.

TSGut commented 1 year ago

Just created a PR that hopefully fixes this for you, let me know how it goes

JuliaApproximation / SemiclassicalOrthogonalPolynomials.jl

Lowering in weighted vs. unweighted #78