JuliaDataCubes / YAXArrays.jl

Yet Another XArray-like Julia package
https://juliadatacubes.github.io/YAXArrays.jl/
Other
89 stars 12 forks source link

yaxarrays slower than dimensionaldata #361

Open bjarthur opened 5 months ago

bjarthur commented 5 months ago

it's slower when converting yax to dd:

julia> using YAXArrays, YAXArrayBase, DimensionalData, BenchmarkTools

julia> yax = YAXArray(rand(10, 20, 5));

julia> dd = yaxconvert(DimArray, yax);

julia> @benchmark yax[Dim_1=1:3]
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min … max):  4.059 μs … 190.583 μs  ┊ GC (min … max): 0.00% … 95.98%
 Time  (median):     4.137 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.303 μs ±   4.106 μs  ┊ GC (mean ± σ):  2.28% ±  2.34%

  ▂▇██▇▅▃▁                   ▁                                ▂
  █████████▆▆▄▄▅▃▃▂▃▂▂▅▇████████▇▇▆▅▅▅▄▅▄▅▄▃▄▆▅▆▆▆▆▇▆▄▄▅▅▄▄▅▆ █
  4.06 μs      Histogram: log(frequency) by time       5.5 μs <

 Memory estimate: 4.88 KiB, allocs estimate: 87.

julia> @benchmark dd[Dim_1=1:3]
BenchmarkTools.Trial: 10000 samples with 313 evaluations.
 Range (min … max):  269.834 ns …   8.516 μs  ┊ GC (min … max):  0.00% … 96.06%
 Time  (median):     369.808 ns               ┊ GC (median):     0.00%
 Time  (mean ± σ):   489.908 ns ± 878.050 ns  ┊ GC (mean ± σ):  24.04% ± 12.50%

  █▇                                                            ▁
  ███▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▆██ █
  270 ns        Histogram: log(frequency) by time        6.9 μs <

 Memory estimate: 2.59 KiB, allocs estimate: 2.

as well as converting dd to yax:

julia> DD = DimArray(rand(50, 31), (X(), Y(10.0:40.0)), metadata = Dict{String, Any}());

julia> YAX = yaxconvert(YAXArray, DD)
50×31 YAXArray{Float64,2} with dimensions: 
  Dim{:X},
  Dim{:Y} Sampled{Float64} 10.0:1.0:40.0 ForwardOrdered Regular Points
Total size: 12.11 KB

julia> @benchmark DD[Y(1:10), X(1)]
BenchmarkTools.Trial: 10000 samples with 991 evaluations.
 Range (min … max):  41.751 ns … 407.417 ns  ┊ GC (min … max): 0.00% … 85.99%
 Time  (median):     42.592 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   44.705 ns ±  16.278 ns  ┊ GC (mean ± σ):  2.39% ±  5.63%

  ▃▆█▇▄▂        ▂▂▂▂▁                                          ▁
  ███████▆▅▄▄▃▄███████▆▆▆▆▅▆▆▆▇▇▇█▇▇█▇▇▆▅▆▆▅▆▆▅▆▆▆▅▅▅▄▅▄▄▄▄▄▅▅ █
  41.8 ns       Histogram: log(frequency) by time      59.5 ns <

 Memory estimate: 240 bytes, allocs estimate: 2.

julia> @benchmark YAX[Dim{:Y}(1:10), Dim{:X}(1)]
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.366 μs … 187.162 μs  ┊ GC (min … max): 0.00% … 97.36%
 Time  (median):     2.431 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.572 μs ±   4.234 μs  ┊ GC (mean ± σ):  3.97% ±  2.38%

  ▁▆██▇▆▄▂▂                   ▁ ▁▁                            ▂
  ██████████▆▆▄▅▁▃▁▃▃▁▁▁▄▅▇▇██████▇▇▇▆▄▆▆▄▃▃▄▁▆▅▄▅▄▃▅▅▆▆▆▆▆▄▅ █
  2.37 μs      Histogram: log(frequency) by time      3.42 μs <

 Memory estimate: 2.92 KiB, allocs estimate: 39.

that's a 10-fold difference for the above arrays which are small and in memory. but even for a 450MB on-disk zarr array, yax is still 20% slower than dd:

julia> using Zarr

julia> yax = Cube("foo.zarr");

julia> dd = yaxconvert(DimArray, yax);

julia> @benchmark collect(yax[Dim{:LI}(At("bar"))])
BenchmarkTools.Trial: 73 samples with 1 evaluation.
 Range (min … max):  52.840 ms … 124.095 ms  ┊ GC (min … max):  3.18% … 58.09%
 Time  (median):     54.923 ms               ┊ GC (median):     5.83%
 Time  (mean ± σ):   68.719 ms ±  25.640 ms  ┊ GC (mean ± σ):  24.69% ± 20.59%

  ▂█                                                            
  ██▇▄▁▁▅▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▃▃▃▃▃▁▄▃▃ ▁
  52.8 ms         Histogram: frequency by time          121 ms <

 Memory estimate: 126.95 MiB, allocs estimate: 10584.

julia> @benchmark collect(dd[Dim{:LI}(At("bar"))])
BenchmarkTools.Trial: 110 samples with 1 evaluation.
 Range (min … max):  44.108 ms … 107.490 ms  ┊ GC (min … max): 0.00% … 58.55%
 Time  (median):     45.175 ms               ┊ GC (median):    1.25%
 Time  (mean ± σ):   45.998 ms ±   6.025 ms  ┊ GC (mean ± σ):  2.60% ±  5.66%

         ▄█                                                     
  ▅▄▆▃▆▄█████▃▃▁▃▃▄▄▃▃▃▁▁▁▃▁▁▁▃▃▁▃▁▁▁▁▁▁▁▁▁▁▁▃▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▃ ▃
  44.1 ms         Histogram: frequency by time         51.9 ms <

 Memory estimate: 38.41 MiB, allocs estimate: 2969.

julia> size(yax)
(20222, 1098, 145)

is this expected?

julia> versioninfo()
Julia Version 1.10.0
Commit 3120989f39b (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 12 × Apple M2 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, apple-m1)
  Threads: 1 on 8 virtual cores
Environment:
  JULIA_PROJECT = @.
  JULIA_EDITOR = vi

DimensionalData v0.25.8 and YAXArrays v0.5.2