JuliaDataCubes / YAXArrays.jl

Yet Another XArray-like Julia package
https://juliadatacubes.github.io/YAXArrays.jl/
Other
89 stars 12 forks source link

Bad chunks size when saving to netcdf #360

Open Balinus opened 5 months ago

Balinus commented 5 months ago

Hello,

I'd like to know how to avoid the following error. I have not been able to build a MWE, so perhaps someone has an idea? I tried saving the Dataset to Zarr and it worked. However, my colleagues does not read the zarr format (they use Matlab).

savecube(finaldataset.tasmin, "test.nc", driver=:netcdf, overwrite=true) # saving the dataset with savedataset has the same error.

NetCDF error code -127:
    NetCDF: Bad chunk sizes.

Stacktrace:
  [1] check
    @ ~/.julia/packages/NetCDF/7hOe9/src/netcdf_helpers.jl:22 [inlined]
  [2] nc_def_var_chunking(ncid::Int32, varid::Int32, storage::Int64, hunksizesp::Vector{UInt64})
    @ NetCDF ~/.julia/packages/NetCDF/7hOe9/src/netcdf_c.jl:772
  [3] create_var(nc::NcFile, v::NcVar{Float64, 3, 6}, mode::UInt16)
    @ NetCDF ~/.julia/packages/NetCDF/7hOe9/src/NetCDF.jl:1172
  [4] (::NetCDF.var"#73#76"{UInt16, String, String, NcVar{Float64, 3, 6}, Vector{NcDim}})(nc::NcFile)
    @ NetCDF ~/.julia/packages/NetCDF/7hOe9/src/NetCDF.jl:1259
  [5] open(f::NetCDF.var"#73#76"{UInt16, String, String, NcVar{Float64, 3, 6}, Vector{NcDim}}, args::String; kwargs::Base.Pairs{Symbol, UInt16, Tuple{Symbol}, NamedTuple{(:mode,), Tuple{UInt16}}})
    @ NetCDF ~/.julia/packages/NetCDF/7hOe9/src/NetCDF.jl:1001
  [6] nccreate(::String, ::String, ::String, ::Vararg{Any}; atts::Dict{String, Any}, gatts::Dict{Any, Any}, compress::Int64, t::DataType, mode::UInt16, chunksize::Tuple{Int64, Int64, Int64})
    @ NetCDF ~/.julia/packages/NetCDF/7hOe9/src/NetCDF.jl:1214
  [7] nccreate
    @ ~/.julia/packages/NetCDF/7hOe9/src/NetCDF.jl:1195 [inlined]
  [8] #add_var#94
    @ ~/.julia/packages/YAXArrayBase/R6Frw/src/datasets/netcdf.jl:63 [inlined]
  [9] add_var
    @ ~/.julia/packages/YAXArrayBase/R6Frw/src/datasets/netcdf.jl:60 [inlined]
 [10] create_dataset(T::Type, path::String, gatts::Dict{String, Any}, dimnames::Vector{String}, dimvals::Vector{AbstractVector{Float64}}, dimattrs::Vector{Dict{String, Any}}, vartypes::Vector{DataType}, varnames::Vector{String}, vardims::Vector{Tuple{String, String, String}}, varattrs::Vector{Dict{String, Any}}, varchunks::Vector{Tuple{Int64, Int64, Int64}}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ YAXArrayBase ~/.julia/packages/YAXArrayBase/R6Frw/src/datasets/datasetinterface.jl:62
 [11] create_dataset(T::Type, path::String, gatts::Dict{String, Any}, dimnames::Vector{String}, dimvals::Vector{AbstractVector{Float64}}, dimattrs::Vector{Dict{String, Any}}, vartypes::Vector{DataType}, varnames::Vector{String}, vardims::Vector{Tuple{String, String, String}}, varattrs::Vector{Dict{String, Any}}, varchunks::Vector{Tuple{Int64, Int64, Int64}})
    @ YAXArrayBase ~/.julia/packages/YAXArrayBase/R6Frw/src/datasets/datasetinterface.jl:53
 [12] savedataset(ds::YAXArrays.Datasets.Dataset; path::String, persist::Nothing, overwrite::Bool, append::Bool, skeleton::Bool, backend::Symbol, driver::Symbol, max_cache::Float64, writefac::Float64, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ YAXArrays.Datasets ~/.julia/packages/YAXArrays/no0he/src/DatasetAPI/Datasets.jl:571
 [13] savedataset
    @ ~/.julia/packages/YAXArrays/no0he/src/DatasetAPI/Datasets.jl:521 [inlined]
 [14] savecube(c::YAXArray{Union{Missing, Float64}, 3, DiskArrayTools.CFDiskArray{Union{Missing, Float64}, 3, Float64, ZArray{Float64, 3, Zarr.BloscCompressor, DirectoryStore}, Float64}, Tuple{Dim{:time, DimensionalData.Dimensions.LookupArrays.Sampled{DateTimeNoLeap, Vector{DateTimeNoLeap}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Irregular{Tuple{Nothing, Nothing}}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Dim{:lon, DimensionalData.Dimensions.LookupArrays.Sampled{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, DimensionalData.Dimensions.LookupArrays.ForwardOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Float64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}, Dim{:lat, DimensionalData.Dimensions.LookupArrays.Sampled{Float64, StepRangeLen{Float64, Base.TwicePrecision{Float64}, Base.TwicePrecision{Float64}, Int64}, DimensionalData.Dimensions.LookupArrays.ReverseOrdered, DimensionalData.Dimensions.LookupArrays.Regular{Float64}, DimensionalData.Dimensions.LookupArrays.Points, DimensionalData.Dimensions.LookupArrays.NoMetadata}}}}, path::String; layername::String, datasetaxis::String, max_cache::Float64, backend::Symbol, driver::Symbol, chunks::Nothing, overwrite::Bool, append::Bool, skeleton::Bool, writefac::Float64, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ YAXArrays.Datasets ~/.julia/packages/YAXArrays/no0he/src/DatasetAPI/Datasets.jl:637
 [15] top-level scope
    @ In[100]:1

The tasmin variable has the following chunks.

finaldataset.tasmin

2190×241×161 YAXArray{Union{Missing, Float64},3} with dimensions: 
  Dim{:time} Sampled{DateTimeNoLeap} DateTimeNoLeap[DateTimeNoLeap(1995-01-01T12:00:00), …, DateTimeNoLeap(2000-12-31T12:00:00)] ForwardOrdered Irregular Points,
  Dim{:lon} Sampled{Float64} -82.0:0.125:-52.0 ForwardOrdered Regular Points,
  Dim{:lat} Sampled{Float64} 63.0:-0.125:43.0 ReverseOrdered Regular Points
Total size: 648.3 MB

finaldataset.tasmin.chunks

1×2×8 DiskArrays.GridChunks{3, Tuple{DiskArrays.RegularChunks, DiskArrays.RegularChunks, DiskArrays.RegularChunks}}:
[:, :, 1] =
 (1:2190, 1:217, 1:22)  (1:2190, 218:241, 1:22)

[:, :, 2] =
 (1:2190, 1:217, 23:44)  (1:2190, 218:241, 23:44)

[:, :, 3] =
 (1:2190, 1:217, 45:66)  (1:2190, 218:241, 45:66)

[:, :, 4] =
 (1:2190, 1:217, 67:88)  (1:2190, 218:241, 67:88)

[:, :, 5] =
 (1:2190, 1:217, 89:110)  (1:2190, 218:241, 89:110)

[:, :, 6] =
 (1:2190, 1:217, 111:132)  (1:2190, 218:241, 111:132)

[:, :, 7] =
 (1:2190, 1:217, 133:154)  (1:2190, 218:241, 133:154)

[:, :, 8] =
 (1:2190, 1:217, 155:161)  (1:2190, 218:241, 155:161)
Balinus commented 5 months ago

Note that I worked around the error by "hardcoding" a new chunks.

savedataset(setchunks(finaldataset, (length(finaldataset.time),100,100)), path="test.nc", compress=9, overwrite=true)

Leaving the issue open so that dev can perhaps take a look if the behaviour is something that can be fixed.