Open MikaelSlevinsky opened 1 day ago
I assume you meant to write
julia> Jxt = jacobimatrix(Val(1), T²);
julia> Jyt = jacobimatrix(Val(2), T²);
?
Yes, you're right. Timings and script updated
The timings seem insensitive to the types of the vectors (blockedvectors or vanilla vectors).
With special routines (jacobi_mul!
and jacobi_t_mul!
added above), breaking down the multiplication into blocks, the mul!
with an adjoint view can be made as fast as the current mul!
, but still far off from a multiple (3x or 9x) times the mul!
with a Diagonal
.
Aren't they implemented as KronTrav
? This issue should really be in LazyBandedMatrices.jl, the slow code should be reproducible there
Note ::KronTrav * ::DiagTrav
can be reduced to matrix-matrix which will probably be the fastest. Is this what you actually want? That is, do you want to store your coefficients as matrices and not block-vectors?
Here's an example, but I just realised its not smart enough to reduce to banded matrix banded..... so I think the first step should be to speed up the following.
Note for finite-dimensional we'll have to zero out the bottom right of the matrix since DiagTrav
ignores this...
julia> f = (x,y) -> exp(x*cos(y));
julia> @time c = transform(T², splat(f)) # actually stores a matrix
0.001255 seconds (1.29 k allocations: 154.328 KiB)
ℵ₀-blocked ℵ₀-element LazyBandedMatrices.DiagTrav{Float64, 2, LazyArrays.ApplyArray{Float64, 2, typeof(Base.setindex), Tuple{FillArrays.Zeros{Float64, 2, Tuple{InfiniteArrays.OneToInf{Int64}, InfiniteArrays.OneToInf{Int64}}}, Matrix{Float64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}}:
1.1599721136150525
───────────────────────
0.8308530775663573
0.0
───────────────────────
0.16232639352729974
0.0
-0.0953431447201382
───────────────────────
0.022229436911517752
0.0
-0.28414170532898164
0.0
───────────────────────
0.0023697762434078066
0.0
-0.0977548626325749
0.0
0.010198907422008477
───────────────────────
0.00020799840787857313
0.0
-0.01851811361776135
0.0
0.01439021748117314
0.0
───────────────────────
1.5554820721467286e-5
0.0
-0.00243194182561198
0.0
0.010777474168944185
0.0
-0.0005266935134853384
───────────────────────
1.0142343964071342e-6
0.0
-0.00024590442620664693
0.0
0.0032677180350806525
⋮
julia> @time Jxt * c # actually does banded * matrix * banded
0.005148 seconds (31.94 k allocations: 12.766 MiB)
setindex(ℵ₀-element FillArrays.Zeros{Float64, 1, Tuple{BlockArrays.BlockedOneTo{Int64, ArrayLayouts.RangeCumsum{Int64, InfiniteArrays.OneToInf{Int64}}}}} with indices BlockArrays.BlockedOneTo(ArrayLayouts.RangeCumsum(OneToInf())), 60-blocked 1830-element BlockVector{Float64}, 60-blocked 1830-element BlockArrays.BlockedOneTo{Int64, LazyArrays.ApplyArray{Int64, 1, typeof(vcat), Tuple{Vector{Int64}, StepRangeLen{Int64, Int64, Int64, Int64}}}}) with indices BlockArrays.BlockedOneTo(ArrayLayouts.RangeCumsum(OneToInf())):
0.41542653878317864
───────────────────────
1.2411353103787024
0.0
───────────────────────
0.42654125723893754
0.0
-0.14207085266449082
───────────────────────
0.08234808488535378
0.0
-0.14422057603642566
0.0
───────────────────────
0.011218717659698162
0.0
-0.1513299094733715
0.0
0.00719510874058657
───────────────────────
0.001192665532064637
0.0
-0.050093402229093434
0.0
0.01558764450648057
0.0
───────────────────────
0.00010450632113749013
0.0
-0.009382009021984
0.0
0.008828967758126896
0.0
-0.0004428188754744439
───────────────────────
7.806726705590395e-6
0.0
-0.0012261261148056486
0.0
0.00568161993130593
⋮
I probably won't be able to look at this for a while. But I want you to look into your heart and ask yourself, do you actually want the code to be fast? If so, why is it that you live your life in such a rush?
Slow code is a valuable opportunity to take time for self-reflection, and I would hate to rob you of this valuable opportunity.
Guys, AI is getting out of hand 😅
AI??
The following code compares the
mul!
speed of banded-block-banded matrices to the diagonal. ForChebyshevT
, the diagonalmul!
is mathematically only 3x for X and 9x for Y given the tridagonal-block-tridiagonal structure, but there appears to be a large overhead to it all. Moreover,mul!
withviews
may spend a lot of time with the garbage collector and can be slower even though they require less data. I also don't see whymul!
withX
is not 3x faster than that withY
. Finally,mul!
with an adjoint view is basically unusable (O(n^4) complexity?).