JuliaDataCubes / YAXArrays.jl

Yet Another XArray-like Julia package
https://juliadatacubes.github.io/YAXArrays.jl/
Other
103 stars 18 forks source link

Cube drops variables #202

Closed gdkrmr closed 1 year ago

gdkrmr commented 1 year ago

I have come across this issue several times now, Cube drops some variables.

julia> s3path = "http://data.rsc4earth.de:9000/earthsystemdatacube/v3.0.1/esdc-8d-0.25deg-256x128x128-3.0.1.zarr";

julia> c3 = Cube(s3path);                                                                                         

julia> z3 = Zarr.zopen(s3path, consolidated=true, fill_as_missing=false);                                         

julia> symdiff(c3.axes[4].values, string.(keys(z3.arrays)))                                                       
9-element Vector{String}:                                                                                         
 "sensible_heat"                                                                                                  
 "latent_energy"                                                                                                  
 "time"                                                                                                           
 "terrestrial_ecosystem_respiration"                                                                              
 "lon"                                                                                                            
 "net_radiation"                                                                                                  
 "lat"                                                                                                            
 "burnt_area"                                                                                                     
 "net_ecosystem_exchange"         

Moved over from here: https://github.com/JuliaDataCubes/EarthDataLab.jl/issues/292

lazarusA commented 1 year ago

Yes, this also something that I observe from time to time. Related issue https://github.com/JuliaDataCubes/YAXArrays.jl/issues/47

lazarusA commented 1 year ago

In your example Cube only keeps the variables with the same dimensions, which makes sense, @meggart ?. The others are discarded. The way to open this file is via open_dataset, as in

g = open_dataset(zopen(s3path, consolidated=true, fill_as_missing=false))

and this one contains all the information.

gdkrmr commented 1 year ago

The issue is a change in eltype because some some of the datasets have an offset and scale factor and get wrapped into a DiskArrayTools.CFDiskArray which changes the eltype from Float32 to Float64j. Details in meggart/DiskArrayTools.jl#15 and meggart/DiskArrayTools.jl#16.

gdkrmr commented 1 year ago

I have just checked and Cube is not fixed yet.

gdkrmr commented 1 year ago

fixed now ;-)

lazarusA commented 1 year ago

for your cube I still get the 9 difference: [lon, lat, time are axis, so, those should not count]

using DiskArrayTools, YAXArrays, Zarr
s3path = "http://data.rsc4earth.de:9000/earthsystemdatacube/v3.0.1/esdc-8d-0.25deg-256x128x128-3.0.1.zarr"
c3 = Cube(s3path);                                                                                                                                                                                                           
z3 = Zarr.zopen(s3path, consolidated=true, fill_as_missing=false); 
symdiff(c3.axes[4].values, string.(keys(z3.arrays))) 
9-element Vector{String}:
 "sensible_heat"
 "latent_energy"
 "time"
 "terrestrial_ecosystem_respiration"
 "lon"
 "net_radiation"
 "lat"
 "burnt_area"
 "net_ecosystem_exchange"

with these versions:

(tmp) pkg> st
Status `~/Documents/tmp/Project.toml`
  [fcd2136c] DiskArrayTools v0.1.6 `https://github.com/gdkrmr/DiskArrayTools.jl.git#offsetpromotion`
  [c21b50f5] YAXArrays v0.4.3 `https://github.com/JuliaDataCubes/YAXArrays.jl.git#master`
  [0a941bbe] Zarr v0.8.0

(tmp) pkg> 
gdkrmr commented 1 year ago

you are right, seems like I still need to fix that. It works when using fill_as_missing = true.

gdkrmr commented 1 year ago

I figured out the issue: a "_FillValue" becomes the default missing value for CFDiskArray and adds Missing to its eltype. I have added a commit but still need to test it.

lazarusA commented 1 year ago

I have added a commit but still need to test it.

Indeed. For your use case burnt_area is still missing.

4-element Vector{String}:
 "time"
 "lon"
 "lat"
 "burnt_area"
gdkrmr commented 1 year ago

thanks for testing. burnt_area is Float64, this is as bug in the DataCube.

felixcremer commented 1 year ago

Is this fixed by your PR in DiskArrayTools https://github.com/meggart/DiskArrayTools.jl/issues/16?

gdkrmr commented 1 year ago

Yes, it should be