JuliaDataCubes / YAXArrays.jl

Yet Another XArray-like Julia package
https://juliadatacubes.github.io/YAXArrays.jl/
Other
101 stars 18 forks source link

Speed comparison of mapslices for YAXArray and Raster #353

Open felixcremer opened 10 months ago

felixcremer commented 10 months ago

This is a speed comparison between Rasters and YAXArrays. This uses the COSMO REA dataset. It seems that YAXArrays has some overhead , but then it doesn't matter much whether it is the european or the restriction to the germany data. I also tried to do this with a lazy Raster object, but then the computation didn't finish in many minutes. And when I do a subset of a lazy Raster the data got read at some point. It seems that Raster is better suited for smaller arrays and then for larger Arrays the overhead of YAXArrays doesn't matter that much and it is becoming faster. I would like to explore exactly where this tradeoff lays.

url = "http://esgf1.dkrz.de/thredds/fileServer/cosmo-rea/reanalysis/EUR-6km/DWD/ECMWF-ERAINT/REA6/r1i1p1f1/COSMO/v1/mon/atmos/tas/v20230314/tas_EUR-6km_ECMWF-ERAINT_REA6_r1i1p1f1_COSMO_v1_mon_199501-199512.nc"
filename = split(url, "/")[end]
mkpath("data/")
path = download(url, "data/$filename")
ds = open_dataset(path)
olonp = 180 + ds.rotated_latitude_longitude.properties["grid_north_pole_longitude"]
olatp = ds.rotated_latitude_longitude.properties["grid_north_pole_latitude"]

proj = "+proj=ob_tran +o_proj=latlon +o_lon_p=0 +o_lat_p=$olatp +lon_0=$olonp"

using GADM
using ArchGDAL:ArchGDAL as AG

deu = GADM.get("DEU")
projdeu = AG.reproject(only(deu.geom), ProjString(AG.toPROJ4(AG.getspatialref(deu.geom[1]))), ProjString(proj))
bbox = GI.extent(projdeu)
rger = r[bbox]
r = Raster(path, key="tas")
r = set(r, :rlat=>Y, :rlon=>X)
yax = ds.tas
yaxmem = readcubedata(yax)
yaxger = set(yax, :rlat=>Y, :rlon=> X)[bbox]

julia> @time mapslices(mean, yax, dims="Time");
  1.987760 seconds (5.60 M allocations: 215.388 MiB, 71.15% gc time)

julia> @time mapslices(mean, yaxmem, dims="Time");
  1.679276 seconds (5.59 M allocations: 182.658 MiB, 79.56% gc time)

julia> @time mapslices(mean, r, dims=Ti);
  0.509211 seconds (4.04 M allocations: 64.265 MiB)

julia> @time mapslices(mean, yaxger, dims="Time");
  1.231583 seconds (120.19 k allocations: 3.956 MiB, 99.25% gc time)
julia> @time mapslices(mean, rger, dims=Ti);
  0.010213 seconds (73.89 k allocations: 1.185 MiB)