JuliaDataCubes / YAXArrays.jl

Yet Another XArray-like Julia package
https://juliadatacubes.github.io/YAXArrays.jl/
Other
101 stars 17 forks source link

Cannot open Earth System Data Cube v3 #197

Closed gdkrmr closed 1 year ago

gdkrmr commented 1 year ago

I am having trouble opening the new Data Cube with YAXArrays when setting fill_as_missing = true, it works fine without the parameter.

import YAXArrays                                                                                               
import Zarr                                                                                                    

cube_path_2 = "http://data.rsc4earth.de:9000/earthsystemdatacube/v2.1.1/esdc-8d-0.25deg-184x90x90-2.1.1.zarr"  
z2 = Zarr.zopen(cube_path_2, fill_as_missing = true)                                                           
c2 = YAXArrays.open_dataset(z2)                                                                                

cube_path_3 = "http://data.rsc4earth.de:9000/earthsystemdatacube/v3.0.1/esdc-8d-0.25deg-256x128x128-3.0.1.zarr"
z3 = Zarr.zopen(cube_path_3, fill_as_missing = true)                                                           

julia> c3 = YAXArrays.open_dataset(z3)                                                                                                 
ERROR: MethodError: no method matching typemax(::Type{Union{Missing, Float32}})                                                        
Closest candidates are:                                                                                                                
  typemax(::Union{Dates.DateTime, Type{Dates.DateTime}}) at /opt/julia-1.8.3/share/julia/stdlib/v1.8/Dates/src/types.jl:453            
  typemax(::Union{Dates.Date, Type{Dates.Date}}) at /opt/julia-1.8.3/share/julia/stdlib/v1.8/Dates/src/types.jl:455                    
  typemax(::Union{Dates.Time, Type{Dates.Time}}) at /opt/julia-1.8.3/share/julia/stdlib/v1.8/Dates/src/types.jl:457                    
  ...                                                                                                                                  
Stacktrace:                                                                                                                            
 [1] DiskArrayTools.CFDiskArray(a::Zarr.ZArray{Union{Missing, Float32}, 3, Zarr.BloscCompressor, Zarr.ConsolidatedStore{Zarr.HTTPStore}
}, attr::Dict{String, Any})                                                                                                            
   @ DiskArrayTools ~/.julia/packages/DiskArrayTools/WsUY6/src/DiskArrayTools.jl:225                                                   
 [2] open_dataset(g::Zarr.ZGroup{Zarr.ConsolidatedStore{Zarr.HTTPStore}}; driver::Symbol)                                              
   @ YAXArrays.Datasets ~/.julia/packages/YAXArrays/LgxQX/src/DatasetAPI/Datasets.jl:279                                               
 [3] open_dataset(g::Zarr.ZGroup{Zarr.ConsolidatedStore{Zarr.HTTPStore}})                                                              
   @ YAXArrays.Datasets ~/.julia/packages/YAXArrays/LgxQX/src/DatasetAPI/Datasets.jl:248                                               
 [4] top-level scope                                                                                                                   
   @ REPL[20]:1                                                                                                                        

julia> z2.arrays["time"] |> eltype
Union{Missing, Float64}           

julia> z3.arrays["time"] |> eltype
Int64                                                                                                           

Maybe it is because the time axis is an Int instead of a Float.

gdkrmr commented 1 year ago

I have figured out the reason: Some of the variables have add_offset and scale_factor as attributes, which makes DiskArrays fail the way it does. While adding an offset of zero and scale of 1 does not make much sense, DiskArrays should not choke on it either.

me@mycomputer:/DataCube/v3.0.1/esdc-8d-0.25deg-256x128x128-3.0.1.zarr$ rg -n \"add_offset\" */.*
latent_energy/.zattrs                                                                                       
8:    "add_offset": 0.0,                                                                                    

net_ecosystem_exchange/.zattrs                                                                              
8:    "add_offset": 0.0,                                                                                    

net_radiation/.zattrs                                                                                       
8:    "add_offset": 0.0,                                                                                    

sensible_heat/.zattrs                                                                                       
8:    "add_offset": 0.0,                                                                                    

terrestrial_ecosystem_respiration/.zattrs                                                                   
8:    "add_offset": 0.0,                                                                                    
gdkrmr commented 1 year ago

seems related to https://github.com/JuliaDataCubes/EarthDataLab.jl/issues/284 and https://github.com/meggart/DiskArrayTools.jl/pull/10