JuliaDataCubes / YAXArrays.jl

Yet Another XArray-like Julia package
https://juliadatacubes.github.io/YAXArrays.jl/
Other
101 stars 17 forks source link

More useful printing of Datasets #192

Closed dpabon closed 1 year ago

dpabon commented 1 year ago

I have a mapcube function that generates 3 cubes with the following outdimensions

out_1_dims = OutDims(CategoricalAxis("summary_stat", ["rsquared", "cumulative_variance", "predicted"]), :Time)

# Values of clim_var (z) for pure PFTs
out_2_dims = OutDims(CategoricalAxis("PFTs", pft_list), CategoricalAxis("Values of Z for pure PFTs", ["estimated", "estimated_error"]), :Time)

# delta of clim_var produced by the transitions between PFTs
out_3_dims = OutDims(CategoricalAxis("transitions", [join(pftstrans_comb_names[1], " to ")]), CategoricalAxis("Differences", ["delta", "delta_error", "coocurence"]), :Time)

outdims = (out_1_dims, out_2_dims, out_3_dims)

These 3 cubes share the dimensions lat, lon, time. Then I merged the 3 cubes in a single DataSet.

ds =Dataset(SummaryStats=out_1, PFTs=out_2, Transitions=out_3)

When I print ds I got

Dimensions: 
   transitions         Axis with 1 elements: grass to forest 
   summary_stat        Axis with 3 elements: rsquared cumulative_variance predicted 
   Values of Z for pure PFTsAxis with 2 elements: estimated estimated_error 
   Differences         Axis with 3 elements: delta delta_error coocurence 
   PFTs                Axis with 2 elements: grass forest 
   lat                 Axis with 181 Elements from -90.0 to 90.0
   lon                 Axis with 360 Elements from -180.0 to 179.0
   time                Axis with 25 Elements from 1999-12-31T00:00:00 to 2001-12-31T00:00:00
Variables: SummaryStats PFTs Transitions 

The problem with the current printing is that it is not possible to recognize which dimensions are related between then. A proposal to better understand the composition of a Dataset is:

Shared Axis:
lat                 Axis with 181 Elements from -90.0 to 90.0
lon                 Axis with 360 Elements from -180.0 to 179.0
time                Axis with 25 Elements from 1999-12-31T00:00:00 to 2001-12-31T00:00:00
Variables & Dimensions:
SummaryStats
   └──summary_stat Axis with 3 elements: rsquared cumulative_variance predicted 
PFTs
  └──PFTs Axis with 2 elements: grass forest 
       └──Values of Z for pure PFTs  Axis with 2 elements: estimated estimated_error 
Transitions
  └──transitions Axis with 1 elements: grass to forest 
           └──Differences Axis with 3 elements: delta delta_error coocurence

Cheers, Daniel

dpabon commented 1 year ago

Actually, when I try to save the dataset savedataset(ds, path = "/tmp/test.zarr") I got the following error:

ERROR: DirectoryStore("/tmp/test") PFTs is not empty
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:35
 [2] zcreate(::Type{Float32}, ::DirectoryStore, ::Int64, ::Vararg{Int64}; path::String, chunks::NTuple{5, Int64}, fill_value::Float32, fill_as_missing::Bool, compressor::Zarr.BloscCompressor, filters::Nothing, attrs::Dict{String, Any}, writeable::Bool)
   @ Zarr ~/.julia/packages/Zarr/tmr2s/src/ZArray.jl:270
 [3] zcreate(::Type{Float32}, ::ZGroup{DirectoryStore}, ::String, ::Int64, ::Vararg{Int64}; kwargs::Base.Pairs{Symbol, Any, NTuple{4, Symbol}, NamedTuple{(:fill_value, :fill_as_missing, :attrs, :chunks), Tuple{Float32, Bool, Dict{String, Any}, NTuple{5, Int64}}}})
   @ Zarr ~/.julia/packages/Zarr/tmr2s/src/ZGroup.jl:151
 [4] add_var(p::YAXArrayBase.ZarrDataset, T::Type, varname::String, s::NTuple{5, Int64}, dimnames::Vector{String}, attr::Dict{String, Any}; chunksize::NTuple{5, Int64}, fill_as_missing::Bool, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ YAXArrayBase ~/.julia/packages/YAXArrayBase/eb8bh/src/datasets/zarr.jl:33
 [5] create_dataset(T::Type, path::String, gatts::Dict{String, Any}, dimnames::Vector{String}, dimvals::Vector{AbstractVector}, dimattrs::Vector{Dict{String, Any}}, vartypes::Vector{DataType}, varnames::Vector{String}, vardims::Vector{Vector{String}}, varattrs::Vector{Dict{String, Any}}, varchunks::Vector{Tuple{Int64, Int64, Int64, Int64, Vararg{Int64}}}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ YAXArrayBase ~/.julia/packages/YAXArrayBase/eb8bh/src/datasets/datasetinterface.jl:62
 [6] create_dataset(T::Type, path::String, gatts::Dict{String, Any}, dimnames::Vector{String}, dimvals::Vector{AbstractVector}, dimattrs::Vector{Dict{String, Any}}, vartypes::Vector{DataType}, varnames::Vector{String}, vardims::Vector{Vector{String}}, varattrs::Vector{Dict{String, Any}}, varchunks::Vector{Tuple{Int64, Int64, Int64, Int64, Vararg{Int64}}})
   @ YAXArrayBase ~/.julia/packages/YAXArrayBase/eb8bh/src/datasets/datasetinterface.jl:53
 [7] savedataset(ds::Dataset; path::String, persist::Nothing, overwrite::Bool, append::Bool, skeleton::Bool, backend::Symbol, driver::Symbol, max_cache::Float64, writefac::Float64)
   @ YAXArrays.Datasets ~/.julia/packages/YAXArrays/rQDCf/src/DatasetAPI/Datasets.jl:496
 [8] top-level scope
   @ ~/Nextcloud/nfdi4earth_oemc/bin/julia/exploratory_analysis/sim_cube.jl:718

Did I miss something?

lazarusA commented 1 year ago

overwrite=true ? maybe

felixcremer commented 1 year ago

The problem is more subtle. There is both a cube and an axis named PFTs. Could we give a more informative error in this case?

dpabon commented 1 year ago

You're right Felix! You cannot have a cube and axis with the same name, changing the name's cube solved the problem.