SpeedyWeather / SpeedyWeather.jl

Play atmospheric modelling like it's LEGO.
https://speedyweather.github.io/SpeedyWeather.jl/dev
MIT License
400 stars 24 forks source link

Change LowerTriangularArrays to be subtype of AbstractArray{T,N-1} #543

Open maximilian-gelbrecht opened 1 month ago

maximilian-gelbrecht commented 1 month ago
milankl commented 1 month ago

This is awesome, thanks Maximilian. I'll let you continute work on this, but feel free to let me know if I can contribute

maximilian-gelbrecht commented 1 month ago

Just tested to benchmark the broadcast, and I am bit frustrated and surprised that it still isn't fast.

using SpeedyWeather, BenchmarkTools

NF = Float32 
mmax = 32 
idims = (5,)
lmax = mmax 

L = randn(LowerTriangularArray{NF}, 50, 50, idims...)
L2 = randn(LowerTriangularArray{NF}, 50, 50, idims...)

@benchmark L.data .+ L2.data
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.147 μs … 799.501 μs  ┊ GC (min … max):  0.00% … 98.70%
 Time  (median):     4.103 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   6.806 μs ±  37.810 μs  ┊ GC (mean ± σ):  38.56% ±  6.94%

   ▁       ▁▅█▇▁                                               
  ▅█▆▅▅▃▅▇▇█████▆▄▄▄▄▄▄▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂ ▃
  2.15 μs         Histogram: frequency by time        12.5 μs <

 Memory estimate: 25.11 KiB, allocs estimate: 6.

 @benchmark L .+ L2
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  13.328 μs …  4.845 ms  ┊ GC (min … max):  0.00% … 98.54%
 Time  (median):     14.787 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   18.113 μs ± 97.273 μs  ┊ GC (mean ± σ):  11.87% ±  2.21%

  ▇▆█▇▃▃▃▂▂▃▁▁▄                                               ▂
  ██████████████▇▇▅▆▆▅▅▅▄▄▄▄▂▄▅▃▅▅▇▇▆▅▆▇▇▆▅▄▄▃▄▄▄▄▂▂▂▄▂▅▅▃▄▄▄ █
  13.3 μs      Histogram: log(frequency) by time      44.1 μs <

 Memory estimate: 25.14 KiB, allocs estimate: 5.

I don't really understand why. The find_L function doesn't seem to be the culprit. The undef initialiser is also not that slow:

@benchmark LowerTriangularArray{T, N, ArrayType{T,N}}(undef, SpeedyWeather.LowerTriangularMatrices.matrix_size(L))
BenchmarkTools.Trial: 8898 samples with 188 evaluations.
 Range (min … max):  529.633 ns … 43.168 μs  ┊ GC (min … max):  0.00% … 97.67%
 Time  (median):     773.019 ns              ┊ GC (median):     0.00%
 Time  (mean ± σ):     2.972 μs ±  6.479 μs  ┊ GC (mean ± σ):  73.31% ± 29.36%

  █▃                                                           ▁
  ██▇▆▆▅▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▃▃▁▄▄▅▅▆▆▇▆▇▇▇▇██████████████▇▇ █
  530 ns        Histogram: log(frequency) by time      25.5 μs <

 Memory estimate: 25.05 KiB, allocs estimate: 4.

julia> @benchmark Array{T,N}(undef, size(L))
BenchmarkTools.Trial: 8490 samples with 200 evaluations.
 Range (min … max):  375.935 ns … 41.204 μs  ┊ GC (min … max):  0.00% … 97.10%
 Time  (median):     633.835 ns              ┊ GC (median):     0.00%
 Time  (mean ± σ):     2.926 μs ±  6.537 μs  ┊ GC (mean ± σ):  77.56% ± 30.35%

  █▄                                               ▁           ▁
  ██▇▆▆▅▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▃▄▄▃▆▆▄▅▆▆▆▆▇▇▇███████████▇█▇█▇▇ █
  376 ns        Histogram: log(frequency) by time      25.2 μs <

 Memory estimate: 25.02 KiB, allocs estimate: 3.
maximilian-gelbrecht commented 1 month ago

I'll look at it again later this week, and will look if I find any possible solutions

milankl commented 1 month ago

Maybe some inlining missing? I can have a check too

maximilian-gelbrecht commented 1 month ago

Had a look again, and I solved it.

The problem was slow indexing with CartesianIndex

So before we only had

Base.@propagate_inbounds Base.getindex(L::LowerTriangularArray{T,N}, I::CartesianIndex{M}) where {T,N,M} = getindex(L, Tuple(I)...)

Now, we also have:

Base.@propagate_inbounds Base.getindex(L::LowerTriangularArray{T,N}, I::CartesianIndex{N}) where {T,N} = getindex(L.data, I)

This apparently made all the difference because internally broadcasting uses cartesian indexing and the Tuple(I)... is slow.

maximilian-gelbrecht commented 1 month ago
@benchmark L .+ L2
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.318 μs …  1.983 ms  ┊ GC (min … max):  0.00% … 98.92%
 Time  (median):     4.224 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   8.019 μs ± 47.512 μs  ┊ GC (mean ± σ):  36.22% ±  6.99%

  ▅▆▅▅▄▅▇█▇▅▃▂▂▂▂▁▁▁                  ▁▂▂▂▂▂▁                ▂
  ██████████████████████▇█▇▆▇▆▇▆▆▆▅▆▇████████████▇▅▆▇▆▆▅▅▅▅▅ █
  2.32 μs      Histogram: log(frequency) by time     17.5 μs <

 Memory estimate: 25.14 KiB, allocs estimate: 5.

julia> @benchmark L.data .+ L2.data
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.236 μs …  1.869 ms  ┊ GC (min … max):  0.00% … 99.13%
 Time  (median):     4.154 μs              ┊ GC (median):     0.00%
 Time  (mean ± σ):   7.459 μs ± 47.129 μs  ┊ GC (mean ± σ):  38.76% ±  6.92%

  ▅▆▆▆▅▆▇██▆▄▃▃▂▂▂▂▁▁▁▁                                      ▂
  ████████████████████████▇▇█▇▆▇▇▆▆▅▆▆▇▇█▇▇█▇▇▇▆▇▆▆▆▆▄▄▄▅▃▃▅ █
  2.24 μs      Histogram: log(frequency) by time     17.4 μs <

 Memory estimate: 25.11 KiB, allocs estimate: 6.
maximilian-gelbrecht commented 1 month ago

What's left to do basically is to adjust the usage of AssociatedLegendrePolynomials

milankl commented 1 month ago

I'm still getting this ?

julia> L = rand(LowerTriangularMatrix,5,5)
15-element LowerTriangularMatrix{Float64}:
Error showing value of type LowerTriangularMatrix{Float64}:
ERROR: BoundsError: attempt to access 15-element LowerTriangularMatrix{Float64} at index [6, 1]
milankl commented 1 month ago

Last commit shows a LowerTriangularMatrix as

julia> L = rand(LowerTriangularMatrix,5,5)
15-element, 5x5 LowerTriangularMatrix{Float64}
 0.787422   0.0       0.0        0.0       0.0
 0.260061   0.819204  0.0        0.0       0.0
 0.973352   0.620726  0.161257   0.0       0.0
 0.0461552  0.685936  0.0117785  0.382289  0.0
 0.13645    0.501547  0.419401   0.309778  0.688196

julia> Float32.(L)
15-element, 5x5 LowerTriangularMatrix{Float32}
 0.787422   0.0       0.0        0.0       0.0
 0.260061   0.819204  0.0        0.0       0.0
 0.973352   0.620726  0.161257   0.0       0.0
 0.0461552  0.685936  0.0117785  0.382289  0.0
 0.13645    0.501547  0.419401   0.309778  0.688196

julia> L = rand(LowerTriangularMatrix,15,15)
120-element, 15x15 LowerTriangularMatrix{Float64}
 0.261788   0.0        0.0        0.0        …  0.0       0.0        0.0       0.0
 0.495482   0.117453   0.0        0.0           0.0       0.0        0.0       0.0
 0.0925019  0.796409   0.937247   0.0           0.0       0.0        0.0       0.0
 0.816407   0.409503   0.779616   0.182904      0.0       0.0        0.0       0.0
 0.915145   0.823384   0.270083   0.565954      0.0       0.0        0.0       0.0
 0.737812   0.795098   0.194549   0.698349   …  0.0       0.0        0.0       0.0
 0.275376   0.939357   0.535364   0.515381      0.0       0.0        0.0       0.0
 0.910015   0.22325    0.972193   0.0490044     0.0       0.0        0.0       0.0
 0.823201   0.128216   0.0735159  0.122388      0.0       0.0        0.0       0.0
 0.169003   0.862217   0.621427   0.695971      0.0       0.0        0.0       0.0
 0.659415   0.0279775  0.245747   0.247737   …  0.0       0.0        0.0       0.0
 0.596095   0.988664   0.441109   0.990378      0.177168  0.0        0.0       0.0
 0.28178    0.189885   0.609562   0.415566      0.155296  0.989642   0.0       0.0
 0.710654   0.138662   0.849563   0.444248      0.467292  0.0712397  0.722201  0.0
 0.104208   0.775013   0.634025   0.199434      0.955224  0.219269   0.102539  0.0512979
milankl commented 1 month ago

But then for higher dimensions the matrices just flattened out into columns

julia> L = rand(LowerTriangularArray,5,5,5)
15×5 LowerTriangularArray{Float64, 2, Matrix{Float64}}:
 0.90549    0.179909   0.722785   0.656362   0.215722
 0.0663613  0.209663   0.393662   0.29946    0.142641
 0.31763    0.121395   0.941059   0.14699    0.738631
 0.132192   0.160128   0.157869   0.751886   0.0505202
 0.61786    0.7052     0.841462   0.915375   0.0175474
 0.802477   0.113889   0.344423   0.261357   0.484409
 0.0662565  0.933641   0.271906   0.0445899  0.085901
 0.0866564  0.0510311  0.115502   0.477471   0.195907
 0.995015   0.367898   0.14462    0.5566     0.626406
 0.613344   0.327654   0.292307   0.598366   0.416606
 0.863694   0.273074   0.0687791  0.901928   0.422603
 0.77978    0.471441   0.133331   0.0685466  0.529635
 0.617265   0.944824   0.989218   0.985348   0.371635
 0.40759    0.570559   0.0131059  0.229718   0.532602
 0.998965   0.150414   0.0708596  0.853031   0.308048
milankl commented 1 month ago

In #525 I started implementing the N-dimensional spectral transform just looping over the additional dimensions because I thought it would be good to check whether we have any constraints from that regarding implementations here. Good news, the 3D case works perfectly fine with only little tweaks (grids::AbstractGridArray and specs::LowerTriangularArray)

for k in eachgrid(grids)   # produces a CartesianIndex looping over dimension 3 to N
...
    specs[lm, k]
    ...
    view(grids.data, 1:20, k)
...

So just adding the index k, however for the 2D case where k = CartesianIndex() the [lm, k] doesn't work for LowerTriangularMatrix

julia> L = rand(LowerTriangularMatrix, 5, 5)
15-element, 5x5 LowerTriangularMatrix{Float64}
 0.301503  0.0        0.0       0.0       0.0
 0.985783  0.0208044  0.0       0.0       0.0
 0.628694  0.0216734  0.462313  0.0       0.0
 0.221242  0.817672   0.778306  0.301415  0.0
 0.241364  0.18049    0.188116  0.456697  0.596747

julia> L[1, CartesianIndex()]
ERROR: MethodError: no method matching isless(::Int64, ::CartesianIndex{0})

I'm not quite sure which getindex method is called for ::Int, ::CartesianIndex but that's needed -- I can have a look next week too. It does work for AbstractGridArray

julia> G = rand(OctaHEALPixGrid,2)
16-element, 3-ring OctaHEALPixGrid{Float64}:
 0.6672885408610827
 0.3029214314408146
 0.25557101111147806
 0.023467728044619385
 0.13131270800462713
 0.6096673403313817
 0.19343526034598502
 0.6247839303906794
 0.8554319237365166
 0.12169715210643173
 0.9504931175461784
 0.9193855210964392
 0.49717053897764873
 0.8925290658067014
 0.232981331232753
 0.03177260647949132

julia> G[1, CartesianIndex()]
0.6672885408610827
milankl commented 1 month ago

I'll continue experimenting with the dynamical core just because I feel that gives us some more insights to decisions we're making here.

maximilian-gelbrecht commented 4 weeks ago

The CartesienIndex{0} is a bit tricky to handle in an elegant way for us because we have the different indexing styles that are dispatched by checking the number of input arguments so getindex(L::LTA{T,N}, I::Varargs{Any,N}) etc. An empty index kind of messes with this logic.

I'll add this in a bit hacky way by explicitly defining a separate getindex(L::LTA{T,1}, i::Integer, C::CartesianIndex{0}). That should be enough.

milankl commented 4 weeks ago

Yes agree. Leading integer argument then trailing cartesianindex is all we need! Thanks Max!!

maximilian-gelbrecht commented 4 weeks ago

There's currently an error in the transforms here, that is a bit hard for me to see where it comes from.

The "Transform: Orography (exact grids)" test fails, but only for the ClenshawGrids and only in the last row of the spectral coefficients.

maximilian-gelbrecht commented 3 weeks ago

Made a brief check again. I don't really understand why this error is happening.

@milankl do you have any idea? I didn't really change anything in SpectralTransforms that should be able to cause this, and that it is only appearing for ClenshawGrids and not GaussianGrid is a bit weird.

using SpeedyWeather
trunc = 31
NF = Float32 
Grid = FullClenshawGrid

dealiasing = Grid in (FullGaussianGrid, OctahedralGaussianGrid) ? 2 : 3

SG = SpectralGrid(NF; trunc, Grid, dealiasing)
S = SpectralTransform(SG, recompute_legendre=false)
O = EarthOrography(SG, smoothing=true)
E = Earth(SG)
initialize!(O, E, S)

oro_grid = O.orography
oro_spec = spectral(oro_grid, S)

oro_grid1 = gridded(oro_spec, S)
oro_spec1 = spectral(oro_grid1, S)
oro_grid2 = gridded(oro_spec1, S)
oro_spec2 = spectral(oro_grid2, S)
 oro_spec1 - oro_spec2
560-element, 33x32 LowerTriangularMatrix{ComplexF32}
 -0.00012207+0.0im          0.0+0.0im                 0.0+0.0im         …          0.0+0.0im                 0.0+0.0im
 -3.05176f-5+0.0im  -1.52588f-5-1.52588f-5im          0.0+0.0im                    0.0+0.0im                 0.0+0.0im
  6.10352f-5+0.0im  -2.38419f-7+1.52588f-5im  -3.05176f-5+0.0im                    0.0+0.0im                 0.0+0.0im
 -3.05176f-5+0.0im   7.62939f-6+3.05176f-5im  -1.52588f-5-7.62939f-6im             0.0+0.0im                 0.0+0.0im
 -3.05176f-5+0.0im          0.0-1.52588f-5im          0.0-4.76837f-6im             0.0+0.0im                 0.0+0.0im
         0.0+0.0im  -1.90735f-6+2.28882f-5im  -1.14441f-5+0.0im         …          0.0+0.0im                 0.0+0.0im
 -3.05176f-5+0.0im   1.90735f-6-3.05176f-5im   7.62939f-6+3.8147f-6im              0.0+0.0im                 0.0+0.0im
            ⋮                                                           ⋱             ⋮              
  3.05176f-5+0.0im  -2.86102f-6-1.33514f-5im   1.90735f-6+3.8147f-6im              0.0+0.0im                 0.0+0.0im
  1.52588f-5+0.0im  -3.21865f-6+0.0im          9.53674f-6+2.38419f-6im             0.0+0.0im                 0.0+0.0im
 -9.53674f-6+0.0im   3.33786f-6-1.57356f-5im    3.8147f-6-1.90735f-6im             0.0+0.0im                 0.0+0.0im
  8.58307f-6+0.0im  -1.90735f-6-1.52588f-5im  -1.43051f-5-1.90735f-6im             0.0+0.0im                 0.0+0.0im
  3.05176f-5+0.0im  -1.90735f-6-3.8147f-6im    1.33514f-5-4.76837f-7im  …  -6.25849f-7+5.96046f-7im          0.0+0.0im
 -2.86102f-6+0.0im  -6.91414f-6-1.33514f-5im  -1.71661f-5-3.8147f-6im      -2.38419f-6-9.53674f-7im  -9.53674f-7+9.53674f-7im
    0.238285+0.0im    -0.194557-0.108487im      0.0348457+0.129144im               0.0+1.90735f-6im   9.53674f-7+0.0im
milankl commented 3 weeks ago

Back in Oxford, I can have a look today. For now: I find it suspicious that 9.53674f-7 shows up 4 times in the highest wavenumbers, maybe there's some indexing going wrong? EDIT: Ah sorry that's the error you printed

milankl commented 3 weeks ago

@maximilian-gelbrecht I'm making progress on the rewrite of the dynamical core in #525 too to inform some of the decisions here, I think there's really only two things I've been adding/tweaking

  1. Vertical iterator eachmatrix and size match checker lowertriangular_match

https://github.com/SpeedyWeather/SpeedyWeather.jl/blob/b8737f7cf95507a0fdab517570bfeea2c6abc9e9/src/LowerTriangularMatrices/lower_triangular_matrix.jl#L282-L285

Something like eachmatrix should automatically check compatible sizes, like eachindex(a,b) does too, avoids a lot of manual @boundscheck size(a) == size(b)`` stuff which needed tweaking anyway because (5,5) and (5,5,1) should match (pres and velocity for example). I've implemented something similar foreachgrid` for the ring grids.

I think we should rename lowertriangular_match in ismatching because we need conceptually the same thing for the ring grids and the spectral transforms... ismatching would then just take many combinations of lower triangular matrices, or ring grids with transforms and tell you whether they "match". This function then gets a horizontal_only keyword so that ismatching(A, B, horizontal_only=true) would return true for something like (5, 5) and (5, 5, 2) (e.g. the former is constant across the vertical)

  1. matrix_size with OneBased, ZeroBased

https://github.com/SpeedyWeather/SpeedyWeather.jl/blob/b8737f7cf95507a0fdab517570bfeea2c6abc9e9/src/LowerTriangularMatrices/lower_triangular_matrix.jl#L48-L59

The spectral gradients and other parts of the dynamical core are somewhat inconsistent on whether 1-based or 0-based indexing is used (and there are places where it makes sense to mix), to make this clearer I'm suggesting these abstract types and then

https://github.com/SpeedyWeather/SpeedyWeather.jl/blob/b8737f7cf95507a0fdab517570bfeea2c6abc9e9/src/LowerTriangularMatrices/lower_triangular_matrix.jl#L68-L70

so that one can neatly write

https://github.com/SpeedyWeather/SpeedyWeather.jl/blob/b8737f7cf95507a0fdab517570bfeea2c6abc9e9/src/SpeedyTransforms/spectral_gradients.jl#L64-L65

milankl commented 2 weeks ago

I think I'd instead of Base.size(L::LowerTriangularArray; as::T=Vector) with if/else just do

Base.size(L::LowerTriangularArray; as::Type{Matrix}) = (L.m, L.n, size(L)[2:end]...)
Base.size(L::LowerTriangularArray; as::Type{Vector}) = size(L.data)
Base.size(L::LowerTriangularArray) = size(L, as=Vector)

to let dispatch handle the cases? I'll need to extend this with OneBased, ZeroBased anyway

maximilian-gelbrecht commented 2 weeks ago

I did try to do it like that initially but this doesn't work with the dispatching rules apparently. You'll immediately get warnings that the function definitions are being overwritten

maximilian-gelbrecht commented 2 weeks ago

@milankl I tried to look into the Crenshaw grid error again, but couldn't fix it.

I can't see where I changed something that would even affect it to be honest. Could you do a full code review and look if you can find a part of code change that could affect it?

milankl commented 2 weeks ago

I did try to do it like that initially but this doesn't work with the dispatching rules apparently. You'll immediately get warnings that the function definitions are being overwritten

Right, because multiple dispatch doesn't apply to keyword arguments, what about

julia> struct S end

julia> Base.size(s::S, as::Type{<:Matrix}) = "matrix"

julia> Base.size(s::S, as::Type{<:Vector}) = "vector"

julia> Base.size(s::S; as=Vector) = size(s, as)

julia> size(S())
"vector"

julia> size(S(), as=Matrix)
"matrix"

julia> size(S(), as=Vector)
"vector"

then? make as the second argument and define a method that turns keyword as into the 2nd argument?