Open maximilian-gelbrecht opened 1 month ago
This is awesome, thanks Maximilian. I'll let you continute work on this, but feel free to let me know if I can contribute
Just tested to benchmark the broadcast, and I am bit frustrated and surprised that it still isn't fast.
using SpeedyWeather, BenchmarkTools
NF = Float32
mmax = 32
idims = (5,)
lmax = mmax
L = randn(LowerTriangularArray{NF}, 50, 50, idims...)
L2 = randn(LowerTriangularArray{NF}, 50, 50, idims...)
@benchmark L.data .+ L2.data
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
Range (min … max): 2.147 μs … 799.501 μs ┊ GC (min … max): 0.00% … 98.70%
Time (median): 4.103 μs ┊ GC (median): 0.00%
Time (mean ± σ): 6.806 μs ± 37.810 μs ┊ GC (mean ± σ): 38.56% ± 6.94%
▁ ▁▅█▇▁
▅█▆▅▅▃▅▇▇█████▆▄▄▄▄▄▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂ ▃
2.15 μs Histogram: frequency by time 12.5 μs <
Memory estimate: 25.11 KiB, allocs estimate: 6.
@benchmark L .+ L2
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 13.328 μs … 4.845 ms ┊ GC (min … max): 0.00% … 98.54%
Time (median): 14.787 μs ┊ GC (median): 0.00%
Time (mean ± σ): 18.113 μs ± 97.273 μs ┊ GC (mean ± σ): 11.87% ± 2.21%
▇▆█▇▃▃▃▂▂▃▁▁▄ ▂
██████████████▇▇▅▆▆▅▅▅▄▄▄▄▂▄▅▃▅▅▇▇▆▅▆▇▇▆▅▄▄▃▄▄▄▄▂▂▂▄▂▅▅▃▄▄▄ █
13.3 μs Histogram: log(frequency) by time 44.1 μs <
Memory estimate: 25.14 KiB, allocs estimate: 5.
I don't really understand why. The find_L
function doesn't seem to be the culprit. The undef initialiser is also not that slow:
@benchmark LowerTriangularArray{T, N, ArrayType{T,N}}(undef, SpeedyWeather.LowerTriangularMatrices.matrix_size(L))
BenchmarkTools.Trial: 8898 samples with 188 evaluations.
Range (min … max): 529.633 ns … 43.168 μs ┊ GC (min … max): 0.00% … 97.67%
Time (median): 773.019 ns ┊ GC (median): 0.00%
Time (mean ± σ): 2.972 μs ± 6.479 μs ┊ GC (mean ± σ): 73.31% ± 29.36%
█▃ ▁
██▇▆▆▅▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▃▃▁▄▄▅▅▆▆▇▆▇▇▇▇██████████████▇▇ █
530 ns Histogram: log(frequency) by time 25.5 μs <
Memory estimate: 25.05 KiB, allocs estimate: 4.
julia> @benchmark Array{T,N}(undef, size(L))
BenchmarkTools.Trial: 8490 samples with 200 evaluations.
Range (min … max): 375.935 ns … 41.204 μs ┊ GC (min … max): 0.00% … 97.10%
Time (median): 633.835 ns ┊ GC (median): 0.00%
Time (mean ± σ): 2.926 μs ± 6.537 μs ┊ GC (mean ± σ): 77.56% ± 30.35%
█▄ ▁ ▁
██▇▆▆▅▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▃▄▄▃▆▆▄▅▆▆▆▆▇▇▇███████████▇█▇█▇▇ █
376 ns Histogram: log(frequency) by time 25.2 μs <
Memory estimate: 25.02 KiB, allocs estimate: 3.
I'll look at it again later this week, and will look if I find any possible solutions
Maybe some inlining missing? I can have a check too
Had a look again, and I solved it.
The problem was slow indexing with CartesianIndex
So before we only had
Base.@propagate_inbounds Base.getindex(L::LowerTriangularArray{T,N}, I::CartesianIndex{M}) where {T,N,M} = getindex(L, Tuple(I)...)
Now, we also have:
Base.@propagate_inbounds Base.getindex(L::LowerTriangularArray{T,N}, I::CartesianIndex{N}) where {T,N} = getindex(L.data, I)
This apparently made all the difference because internally broadcasting uses cartesian indexing and the Tuple(I)...
is slow.
@benchmark L .+ L2
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
Range (min … max): 2.318 μs … 1.983 ms ┊ GC (min … max): 0.00% … 98.92%
Time (median): 4.224 μs ┊ GC (median): 0.00%
Time (mean ± σ): 8.019 μs ± 47.512 μs ┊ GC (mean ± σ): 36.22% ± 6.99%
▅▆▅▅▄▅▇█▇▅▃▂▂▂▂▁▁▁ ▁▂▂▂▂▂▁ ▂
██████████████████████▇█▇▆▇▆▇▆▆▆▅▆▇████████████▇▅▆▇▆▆▅▅▅▅▅ █
2.32 μs Histogram: log(frequency) by time 17.5 μs <
Memory estimate: 25.14 KiB, allocs estimate: 5.
julia> @benchmark L.data .+ L2.data
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
Range (min … max): 2.236 μs … 1.869 ms ┊ GC (min … max): 0.00% … 99.13%
Time (median): 4.154 μs ┊ GC (median): 0.00%
Time (mean ± σ): 7.459 μs ± 47.129 μs ┊ GC (mean ± σ): 38.76% ± 6.92%
▅▆▆▆▅▆▇██▆▄▃▃▂▂▂▂▁▁▁▁ ▂
████████████████████████▇▇█▇▆▇▇▆▆▅▆▆▇▇█▇▇█▇▇▇▆▇▆▆▆▆▄▄▄▅▃▃▅ █
2.24 μs Histogram: log(frequency) by time 17.4 μs <
Memory estimate: 25.11 KiB, allocs estimate: 6.
What's left to do basically is to adjust the usage of AssociatedLegendrePolynomials
I'm still getting this ?
julia> L = rand(LowerTriangularMatrix,5,5)
15-element LowerTriangularMatrix{Float64}:
Error showing value of type LowerTriangularMatrix{Float64}:
ERROR: BoundsError: attempt to access 15-element LowerTriangularMatrix{Float64} at index [6, 1]
Last commit shows a LowerTriangularMatrix
as
julia> L = rand(LowerTriangularMatrix,5,5)
15-element, 5x5 LowerTriangularMatrix{Float64}
0.787422 0.0 0.0 0.0 0.0
0.260061 0.819204 0.0 0.0 0.0
0.973352 0.620726 0.161257 0.0 0.0
0.0461552 0.685936 0.0117785 0.382289 0.0
0.13645 0.501547 0.419401 0.309778 0.688196
julia> Float32.(L)
15-element, 5x5 LowerTriangularMatrix{Float32}
0.787422 0.0 0.0 0.0 0.0
0.260061 0.819204 0.0 0.0 0.0
0.973352 0.620726 0.161257 0.0 0.0
0.0461552 0.685936 0.0117785 0.382289 0.0
0.13645 0.501547 0.419401 0.309778 0.688196
julia> L = rand(LowerTriangularMatrix,15,15)
120-element, 15x15 LowerTriangularMatrix{Float64}
0.261788 0.0 0.0 0.0 … 0.0 0.0 0.0 0.0
0.495482 0.117453 0.0 0.0 0.0 0.0 0.0 0.0
0.0925019 0.796409 0.937247 0.0 0.0 0.0 0.0 0.0
0.816407 0.409503 0.779616 0.182904 0.0 0.0 0.0 0.0
0.915145 0.823384 0.270083 0.565954 0.0 0.0 0.0 0.0
0.737812 0.795098 0.194549 0.698349 … 0.0 0.0 0.0 0.0
0.275376 0.939357 0.535364 0.515381 0.0 0.0 0.0 0.0
0.910015 0.22325 0.972193 0.0490044 0.0 0.0 0.0 0.0
0.823201 0.128216 0.0735159 0.122388 0.0 0.0 0.0 0.0
0.169003 0.862217 0.621427 0.695971 0.0 0.0 0.0 0.0
0.659415 0.0279775 0.245747 0.247737 … 0.0 0.0 0.0 0.0
0.596095 0.988664 0.441109 0.990378 0.177168 0.0 0.0 0.0
0.28178 0.189885 0.609562 0.415566 0.155296 0.989642 0.0 0.0
0.710654 0.138662 0.849563 0.444248 0.467292 0.0712397 0.722201 0.0
0.104208 0.775013 0.634025 0.199434 0.955224 0.219269 0.102539 0.0512979
But then for higher dimensions the matrices just flattened out into columns
julia> L = rand(LowerTriangularArray,5,5,5)
15×5 LowerTriangularArray{Float64, 2, Matrix{Float64}}:
0.90549 0.179909 0.722785 0.656362 0.215722
0.0663613 0.209663 0.393662 0.29946 0.142641
0.31763 0.121395 0.941059 0.14699 0.738631
0.132192 0.160128 0.157869 0.751886 0.0505202
0.61786 0.7052 0.841462 0.915375 0.0175474
0.802477 0.113889 0.344423 0.261357 0.484409
0.0662565 0.933641 0.271906 0.0445899 0.085901
0.0866564 0.0510311 0.115502 0.477471 0.195907
0.995015 0.367898 0.14462 0.5566 0.626406
0.613344 0.327654 0.292307 0.598366 0.416606
0.863694 0.273074 0.0687791 0.901928 0.422603
0.77978 0.471441 0.133331 0.0685466 0.529635
0.617265 0.944824 0.989218 0.985348 0.371635
0.40759 0.570559 0.0131059 0.229718 0.532602
0.998965 0.150414 0.0708596 0.853031 0.308048
In #525 I started implementing the N-dimensional spectral transform just looping over the additional dimensions because I thought it would be good to check whether we have any constraints from that regarding implementations here. Good news, the 3D case works perfectly fine with only little tweaks (grids::AbstractGridArray
and specs::LowerTriangularArray
)
for k in eachgrid(grids) # produces a CartesianIndex looping over dimension 3 to N
...
specs[lm, k]
...
view(grids.data, 1:20, k)
...
So just adding the index k
, however for the 2D case where k = CartesianIndex()
the [lm, k]
doesn't work for LowerTriangularMatrix
julia> L = rand(LowerTriangularMatrix, 5, 5)
15-element, 5x5 LowerTriangularMatrix{Float64}
0.301503 0.0 0.0 0.0 0.0
0.985783 0.0208044 0.0 0.0 0.0
0.628694 0.0216734 0.462313 0.0 0.0
0.221242 0.817672 0.778306 0.301415 0.0
0.241364 0.18049 0.188116 0.456697 0.596747
julia> L[1, CartesianIndex()]
ERROR: MethodError: no method matching isless(::Int64, ::CartesianIndex{0})
I'm not quite sure which getindex
method is called for ::Int, ::CartesianIndex
but that's needed -- I can have a look next week too. It does work for AbstractGridArray
julia> G = rand(OctaHEALPixGrid,2)
16-element, 3-ring OctaHEALPixGrid{Float64}:
0.6672885408610827
0.3029214314408146
0.25557101111147806
0.023467728044619385
0.13131270800462713
0.6096673403313817
0.19343526034598502
0.6247839303906794
0.8554319237365166
0.12169715210643173
0.9504931175461784
0.9193855210964392
0.49717053897764873
0.8925290658067014
0.232981331232753
0.03177260647949132
julia> G[1, CartesianIndex()]
0.6672885408610827
I'll continue experimenting with the dynamical core just because I feel that gives us some more insights to decisions we're making here.
The CartesienIndex{0}
is a bit tricky to handle in an elegant way for us because we have the different indexing styles that are dispatched by checking the number of input arguments so getindex(L::LTA{T,N}, I::Varargs{Any,N})
etc. An empty index kind of messes with this logic.
I'll add this in a bit hacky way by explicitly defining a separate getindex(L::LTA{T,1}, i::Integer, C::CartesianIndex{0})
. That should be enough.
Yes agree. Leading integer argument then trailing cartesianindex is all we need! Thanks Max!!
There's currently an error in the transforms here, that is a bit hard for me to see where it comes from.
The "Transform: Orography (exact grids)" test fails, but only for the ClenshawGrids and only in the last row of the spectral coefficients.
Made a brief check again. I don't really understand why this error is happening.
@milankl do you have any idea? I didn't really change anything in SpectralTransforms that should be able to cause this, and that it is only appearing for ClenshawGrids and not GaussianGrid is a bit weird.
using SpeedyWeather
trunc = 31
NF = Float32
Grid = FullClenshawGrid
dealiasing = Grid in (FullGaussianGrid, OctahedralGaussianGrid) ? 2 : 3
SG = SpectralGrid(NF; trunc, Grid, dealiasing)
S = SpectralTransform(SG, recompute_legendre=false)
O = EarthOrography(SG, smoothing=true)
E = Earth(SG)
initialize!(O, E, S)
oro_grid = O.orography
oro_spec = spectral(oro_grid, S)
oro_grid1 = gridded(oro_spec, S)
oro_spec1 = spectral(oro_grid1, S)
oro_grid2 = gridded(oro_spec1, S)
oro_spec2 = spectral(oro_grid2, S)
oro_spec1 - oro_spec2
560-element, 33x32 LowerTriangularMatrix{ComplexF32}
-0.00012207+0.0im 0.0+0.0im 0.0+0.0im … 0.0+0.0im 0.0+0.0im
-3.05176f-5+0.0im -1.52588f-5-1.52588f-5im 0.0+0.0im 0.0+0.0im 0.0+0.0im
6.10352f-5+0.0im -2.38419f-7+1.52588f-5im -3.05176f-5+0.0im 0.0+0.0im 0.0+0.0im
-3.05176f-5+0.0im 7.62939f-6+3.05176f-5im -1.52588f-5-7.62939f-6im 0.0+0.0im 0.0+0.0im
-3.05176f-5+0.0im 0.0-1.52588f-5im 0.0-4.76837f-6im 0.0+0.0im 0.0+0.0im
0.0+0.0im -1.90735f-6+2.28882f-5im -1.14441f-5+0.0im … 0.0+0.0im 0.0+0.0im
-3.05176f-5+0.0im 1.90735f-6-3.05176f-5im 7.62939f-6+3.8147f-6im 0.0+0.0im 0.0+0.0im
⋮ ⋱ ⋮
3.05176f-5+0.0im -2.86102f-6-1.33514f-5im 1.90735f-6+3.8147f-6im 0.0+0.0im 0.0+0.0im
1.52588f-5+0.0im -3.21865f-6+0.0im 9.53674f-6+2.38419f-6im 0.0+0.0im 0.0+0.0im
-9.53674f-6+0.0im 3.33786f-6-1.57356f-5im 3.8147f-6-1.90735f-6im 0.0+0.0im 0.0+0.0im
8.58307f-6+0.0im -1.90735f-6-1.52588f-5im -1.43051f-5-1.90735f-6im 0.0+0.0im 0.0+0.0im
3.05176f-5+0.0im -1.90735f-6-3.8147f-6im 1.33514f-5-4.76837f-7im … -6.25849f-7+5.96046f-7im 0.0+0.0im
-2.86102f-6+0.0im -6.91414f-6-1.33514f-5im -1.71661f-5-3.8147f-6im -2.38419f-6-9.53674f-7im -9.53674f-7+9.53674f-7im
0.238285+0.0im -0.194557-0.108487im 0.0348457+0.129144im 0.0+1.90735f-6im 9.53674f-7+0.0im
Back in Oxford, I can have a look today. For now: I find it suspicious that 9.53674f-7
shows up 4 times in the highest wavenumbers, maybe there's some indexing going wrong? EDIT: Ah sorry that's the error you printed
@maximilian-gelbrecht I'm making progress on the rewrite of the dynamical core in #525 too to inform some of the decisions here, I think there's really only two things I've been adding/tweaking
eachmatrix
and size match checker lowertriangular_match
Something like eachmatrix
should automatically check compatible sizes, like eachindex(a,b)
does too, avoids a lot of manual @boundscheck size(a) == size(b)`` stuff which needed tweaking anyway because (5,5) and (5,5,1) should match (pres and velocity for example). I've implemented something similar for
eachgrid` for the ring grids.
I think we should rename lowertriangular_match
in ismatching
because we need conceptually the same thing for the ring grids and the spectral transforms... ismatching
would then just take many combinations of lower triangular matrices, or ring grids with transforms and tell you whether they "match". This function then gets a horizontal_only
keyword so that ismatching(A, B, horizontal_only=true)
would return true for something like (5, 5)
and (5, 5, 2)
(e.g. the former is constant across the vertical)
matrix_size
with OneBased, ZeroBasedThe spectral gradients and other parts of the dynamical core are somewhat inconsistent on whether 1-based or 0-based indexing is used (and there are places where it makes sense to mix), to make this clearer I'm suggesting these abstract types and then
so that one can neatly write
I think I'd instead of Base.size(L::LowerTriangularArray; as::T=Vector)
with if/else just do
Base.size(L::LowerTriangularArray; as::Type{Matrix}) = (L.m, L.n, size(L)[2:end]...)
Base.size(L::LowerTriangularArray; as::Type{Vector}) = size(L.data)
Base.size(L::LowerTriangularArray) = size(L, as=Vector)
to let dispatch handle the cases? I'll need to extend this with OneBased
, ZeroBased
anyway
I did try to do it like that initially but this doesn't work with the dispatching rules apparently. You'll immediately get warnings that the function definitions are being overwritten
@milankl I tried to look into the Crenshaw grid error again, but couldn't fix it.
I can't see where I changed something that would even affect it to be honest. Could you do a full code review and look if you can find a part of code change that could affect it?
I did try to do it like that initially but this doesn't work with the dispatching rules apparently. You'll immediately get warnings that the function definitions are being overwritten
Right, because multiple dispatch doesn't apply to keyword arguments, what about
julia> struct S end
julia> Base.size(s::S, as::Type{<:Matrix}) = "matrix"
julia> Base.size(s::S, as::Type{<:Vector}) = "vector"
julia> Base.size(s::S; as=Vector) = size(s, as)
julia> size(S())
"vector"
julia> size(S(), as=Matrix)
"matrix"
julia> size(S(), as=Vector)
"vector"
then? make as
the second argument and define a method that turns keyword as
into the 2nd argument?