JuliaDataCubes / YAXArrays.jl

Yet Another XArray-like Julia package
https://juliadatacubes.github.io/YAXArrays.jl/
Other
103 stars 18 forks source link

Feature request: Save YAXArray or Dataset into a Zarr group #348

Open danlooo opened 1 year ago

danlooo commented 1 year ago

Multiple Datasets in the Common Data Model V4 can be stored in the same file. Hereby, they are organized in (nested) groups, analog to files in directories and subdirectories.

For example, xarray.Dataset.to_zarr has the option group to specify the path inside the zarr storage in which the dataset should be stored. Similarily, zarr.hierarchy.group has the option path to specify the (group) path. The prototype (and part of xarray roadmap) xarray-datatree uses this to represent a tree of Datasets as its own type. I think it is already implemented in Zarr.jl function Zarr.zcreate in option name.

This is of particular importance when it comes to store data cubes of different spatio-temporal resolutions in the same store. I'd be great to have an additional group option to the function savedataset and savecube.

lazarusA commented 1 year ago

data cubes of different spatio-temporal resolutions

https://juliadatacubes.github.io/YAXArrays.jl/dev/examples/generated/UserGuide/creating/#creating-a-dataset

isn't this case already. You can always pass bunch of YAXArrays of different dimensions into a dataset that can be saved as a .zarr file, or?

danlooo commented 1 year ago

Datasets are to store multiple variables sampled over the same grid defined by their shared axes. However, the e.g. spatial axes of different resolutions are not the same. Trying this:

using YAXArrays
using Zarr
high_res_cube = YAXArray(rand(10, 10, 3))
low_res_cube = YAXArray(rand(5, 5, 3))
ds = Dataset(high_res = high_res_cube, low_res = low_res_cube)
savedataset(ds; path = "foo.zarr", driver=:zarr)

also returns an error when it comes to saving the dataset on disk:

ERROR: ArgumentError: Can not construct YAXArray, supplied data size is (10, 10, 3) while axis lenghts are (5, 5, 3)
Stacktrace:
  [1] YAXArray(axes::Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, data::ZArray{Float64, 3, Zarr.BloscCompressor, DirectoryStore}, properties::Dict{String, Any}, chunks::DiskArrays.GridChunks{3}, cleaner::Vector{YAXArrays.Cubes.CleanMe})
    @ YAXArrays.Cubes ~/.julia/packages/YAXArrays/R6KY3/src/Cubes/Cubes.jl:110
  [2] #YAXArray#5
    @ ~/.julia/packages/YAXArrays/R6KY3/src/Cubes/Cubes.jl:129 [inlined]
  [3] collectfromhandle(e::NamedTuple{(:name, :t, :chunks, :axes, :attr, :subs, :require_CF, :offs), Tuple{String, DataType, Tuple{Int64, Int64, Int64}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, Dict{String, Any}, Nothing, Bool, Dict{Symbol, Int64}}}, dshandle::YAXArrayBase.ZarrDataset, cleaner::Vector{YAXArrays.Cubes.CleanMe})
    @ YAXArrays.Datasets ~/.julia/packages/YAXArrays/R6KY3/src/DatasetAPI/Datasets.jl:403
  [4] #102
    @ ~/.julia/packages/YAXArrays/R6KY3/src/DatasetAPI/Datasets.jl:564 [inlined]
  [5] iterate
    @ ./generator.jl:47 [inlined]
  [6] collect_to!(dest::Vector{YAXArray{Float64, 3, ZArray{Float64, 3, Zarr.BloscCompressor, DirectoryStore}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}}}, itr::Base.Generator{Vector{NamedTuple{(:name, :t, :chunks, :axes, :attr, :subs, :require_CF, :offs), Tuple{String, DataType, Tuple{Int64, Int64, Int64}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, Dict{String, Any}, Nothing, Bool, Dict{Symbol, Int64}}}}, YAXArrays.Datasets.var"#102#108"{YAXArrayBase.ZarrDataset, Vector{YAXArrays.Cubes.CleanMe}}}, offs::Int64, st::Int64)
    @ Base ./array.jl:840
  [7] collect_to_with_first!(dest::Vector{YAXArray{Float64, 3, ZArray{Float64, 3, Zarr.BloscCompressor, DirectoryStore}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}}}, v1::YAXArray{Float64, 3, ZArray{Float64, 3, Zarr.BloscCompressor, DirectoryStore}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}}, itr::Base.Generator{Vector{NamedTuple{(:name, :t, :chunks, :axes, :attr, :subs, :require_CF, :offs), Tuple{String, DataType, Tuple{Int64, Int64, Int64}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, Dict{String, Any}, Nothing, Bool, Dict{Symbol, Int64}}}}, YAXArrays.Datasets.var"#102#108"{YAXArrayBase.ZarrDataset, Vector{YAXArrays.Cubes.CleanMe}}}, st::Int64)
    @ Base ./array.jl:818
  [8] _collect(c::Vector{NamedTuple{(:name, :t, :chunks, :axes, :attr, :subs, :require_CF, :offs), Tuple{String, DataType, Tuple{Int64, Int64, Int64}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, Dict{String, Any}, Nothing, Bool, Dict{Symbol, Int64}}}}, itr::Base.Generator{Vector{NamedTuple{(:name, :t, :chunks, :axes, :attr, :subs, :require_CF, :offs), Tuple{String, DataType, Tuple{Int64, Int64, Int64}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, Dict{String, Any}, Nothing, Bool, Dict{Symbol, Int64}}}}, YAXArrays.Datasets.var"#102#108"{YAXArrayBase.ZarrDataset, Vector{YAXArrays.Cubes.CleanMe}}}, #unused#::Base.EltypeUnknown, isz::Base.HasShape{1})
    @ Base ./array.jl:812
  [9] collect_similar(cont::Vector{NamedTuple{(:name, :t, :chunks, :axes, :attr, :subs, :require_CF, :offs), Tuple{String, DataType, Tuple{Int64, Int64, Int64}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, Dict{String, Any}, Nothing, Bool, Dict{Symbol, Int64}}}}, itr::Base.Generator{Vector{NamedTuple{(:name, :t, :chunks, :axes, :attr, :subs, :require_CF, :offs), Tuple{String, DataType, Tuple{Int64, Int64, Int64}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, Dict{String, Any}, Nothing, Bool, Dict{Symbol, Int64}}}}, YAXArrays.Datasets.var"#102#108"{YAXArrayBase.ZarrDataset, Vector{YAXArrays.Cubes.CleanMe}}})
    @ Base ./array.jl:711
 [10] map(f::Function, A::Vector{NamedTuple{(:name, :t, :chunks, :axes, :attr, :subs, :require_CF, :offs), Tuple{String, DataType, Tuple{Int64, Int64, Int64}, Vector{RangeAxis{Int64, _A, Base.OneTo{Int64}} where _A}, Dict{String, Any}, Nothing, Bool, Dict{Symbol, Int64}}}})
    @ Base ./abstractarray.jl:3261
 [11] savedataset(ds::Dataset; path::String, persist::Nothing, overwrite::Bool, append::Bool, skeleton::Bool, backend::Symbol, driver::Symbol, max_cache::Float64, writefac::Float64, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ YAXArrays.Datasets ~/.julia/packages/YAXArrays/R6KY3/src/DatasetAPI/Datasets.jl:564
 [12] top-level scope
    @ REPL[20]:1