JuliaDataCubes / YAXArrays.jl

Yet Another XArray-like Julia package
https://juliadatacubes.github.io/YAXArrays.jl/
Other
103 stars 18 forks source link

convert(Matrix, x') is very slow #412

Open gdkrmr opened 3 months ago

gdkrmr commented 3 months ago

convert(Matrix, x') is so slow that I haven't gotten it to finish. convert(Matrix, x) works just fine.

julia> using YAXArrays, Zarr

julia> path = "~/data/DataCube/v3.0.2/esdc-8d-0.25deg-1x720x1440-3.0.2.zarr"
"~/data/DataCube/v3.0.2/esdc-8d-0.25deg-1x720x1440-3.0.2.zarr"

julia> ds = open_dataset(path)
YAXArray Dataset
Shared Axes: 
↓ lon Sampled{Float64} -179.875:0.25:179.875 ForwardOrdered Regular Points,
→ lat Sampled{Float64} -89.875:0.25:89.875 ForwardOrdered Regular Points,
↗ Ti  Sampled{DateTime} [1979-01-05T00:00:00, …, 2021-12-31T00:00:00] ForwardOrdered Irregular Points
Variables: 
....

julia> mat = ds.air_temperature_2m[Ti = 1000]
╭──────────────────────────────╮
│ 1440×720 YAXArray{Float32,2} │
├──────────────────────────────┴──────────────────────────────────────── dims ┐
  ↓ lon Sampled{Float64} -179.875:0.25:179.875 ForwardOrdered Regular Points,
  → lat Sampled{Float64} -89.875:0.25:89.875 ForwardOrdered Regular Points
├──────────────────────────────────────────────────────
....

julia> @time convert(Matrix{Float32}, mat)
  1.809297 seconds (5.15 M allocations: 362.756 MiB, 5.92% gc time, 98.13% compilation time)
...

julia> @time convert(Matrix{Float32}, mat')

# has to be interrupted
felixcremer commented 2 weeks ago

This is a DiskArrays LinearAlgebra interaction problem. mat' calls adjoint and this accesses the data with separate getindex calls for every value. You can see that by setting up an AccessCountDiskArray see below. If you don't need to have the adjoint you can use permutedims instead for switching the axes order. This should have special cases which would not access all data one by one.

julia> using DiskArrays
julia> mat = TestTypes.AccessCountDiskArray(rand(4,5));

julia> convert(Matrix, mat)
4×5 Matrix{Float64}:
 0.9342     0.00562529  0.554012  0.0577668  0.591995
 0.0644567  0.760773    0.364234  0.777978   0.103616
 0.268764   0.138414    0.296148  0.501075   0.177412
 0.993255   0.137569    0.111455  0.492843   0.216849

julia> mat.getindex_log
1-element Vector{Any}:
 (1:4, 1:5)

julia> mat = TestTypes.AccessCountDiskArray(rand(4,5));

julia> convert(Matrix, mat')
5×4 Matrix{Float64}:
 0.0455661  0.137063    0.0339512  0.146162
 0.567136   0.00509385  0.614943   0.817899
 0.569512   0.655161    0.185285   0.379507
 0.33343    0.59349     0.930645   0.999305
 0.906612   0.619647    0.171684   0.75579

julia> mat.getindex_log
20-element Vector{Any}:
 (1:1, 1:1)
 (2:2, 1:1)
 (3:3, 1:1)
 (4:4, 1:1)
 (1:1, 2:2)
 (2:2, 2:2)
 (3:3, 2:2)
 ⋮
 (2:2, 4:4)
 (3:3, 4:4)
 (4:4, 4:4)
 (1:1, 5:5)
 (2:2, 5:5)
 (3:3, 5:5)
 (4:4, 5:5)