JuliaDataCubes / YAXArrays.jl

Yet Another XArray-like Julia package
89 stars 12 forks source link

KeyError: key :Ti not found #392

Open Balinus opened 1 month ago

Balinus commented 1 month ago

Hello, I have the following error when using YAXArrays.Datasets.open_mfdataset. The files represent daily data (1st file is day 1, 2nd file is second day, etc). It is ERA5-Land data downloaded from Copernicus (I do not have the downloading script sadly).

using NetCDF
using YAXArrays
using Glob

repbrut = "/path/to/files"
patterns = "*copernicus_era5_land_surface.nc"

files = glob(patterns, repbrut)
obs = YAXArrays.Datasets.open_mfdataset(files[1:10]) # loading only a subset of the 3000 files

KeyError: key :Ti not found

  [1] getindex
    @ ./dict.jl:498 [inlined]
  [2] _broadcast_getindex_evalf
    @ ./broadcast.jl:709 [inlined]
  [3] _broadcast_getindex
    @ ./broadcast.jl:682 [inlined]
  [4] #31
    @ ./broadcast.jl:1118 [inlined]
  [5] ntuple
    @ ./ntuple.jl:50 [inlined]
  [6] copy
    @ ./broadcast.jl:1118 [inlined]
  [7] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple}, Nothing, typeof(getindex), Tuple{Base.RefValue{Dict{Symbol, Any}}, Tuple{Symbol, Symbol, Symbol}}})
    @ Base.Broadcast ./broadcast.jl:903
  [8] merge_datasets(dslist::Vector{YAXArrays.Datasets.Dataset})
    @ YAXArrays.Datasets ~/.julia/packages/YAXArrays/jdA1f/src/DatasetAPI/Datasets.jl:903
  [9] open_mfdataset(g::Vector{String})
    @ YAXArrays.Datasets ~/.julia/packages/YAXArrays/jdA1f/src/DatasetAPI/Datasets.jl:280
 [10] top-level scope
    @ In[4]:5

I can open the files individually, for example:

ds1 = open_dataset(files[1])

YAXArray Dataset
Shared Axes: 
↓ longitude Sampled{Float32} -82.0f0:0.1f0:-50.0f0 ForwardOrdered Regular Points,
→ latitude  Sampled{Float32} 64.0f0:-0.1f0:42.0f0 ReverseOrdered Regular Points,
↗ Ti        Sampled{DateTime} [1950-01-01T00:00:00, …, 1950-01-01T23:00:00] ForwardOrdered Irregular Points
snowc, e, skt, asn, d2m, stl1, t2m, lai_lv, u10, sro, ssrd, src, v10, lai_hv, sp, sd, rsn, evaow, sde, sf, tp, ro, 
Properties: Dict{String, Any}("history" => "2024-05-10 22:57:09 GMT by grib_to_netcdf-2.28.1: /opt/ecmwf/mars-client/bin/grib_to_netcdf -S param -o /cache/data2/adaptor.mars.internal-1715381827.014326-19889-5-81cba0ea-74b0-4995-b5c3-8458c0c8abd5.nc /cache/tmp/81cba0ea-74b0-4995-b5c3-8458c0c8abd5-adaptor.mars.internal-1715381789.8065639-19889-3-tmp.grib", "Conventions" => "CF-1.6")

ds2 = open_dataset(files[2])

YAXArray Dataset
Shared Axes: 
↓ longitude Sampled{Float32} -82.0f0:0.1f0:-50.0f0 ForwardOrdered Regular Points,
→ latitude  Sampled{Float32} 64.0f0:-0.1f0:42.0f0 ReverseOrdered Regular Points,
↗ Ti        Sampled{DateTime} [1950-01-02T00:00:00, …, 1950-01-02T23:00:00] ForwardOrdered Irregular Points
snowc, e, skt, asn, d2m, stl1, lai_lv, t2m, u10, sro, ssrd, src, v10, lai_hv, sp, sd, rsn, evaow, sde, sf, tp, ro, 
Properties: Dict{String, Any}("history" => "2024-05-10 22:54:16 GMT by grib_to_netcdf-2.28.1: /opt/ecmwf/mars-client/bin/grib_to_netcdf -S param -o /cache/data3/adaptor.mars.internal-1715381653.851235-8099-3-1961ccb9-cd31-4fe2-b913-5973053f1ab1.nc /cache/tmp/1961ccb9-cd31-4fe2-b913-5973053f1ab1-adaptor.mars.internal-1715381615.6087704-8099-3-tmp.grib", "Conventions" => "CF-1.6")

but I am unable to merge the datasets:

newds = YAXArrays.Datasets.merge_datasets([ds1, ds2])

KeyError: key :Ti not found

 [1] getindex
   @ ./dict.jl:498 [inlined]
 [2] _broadcast_getindex_evalf
   @ ./broadcast.jl:709 [inlined]
 [3] _broadcast_getindex
   @ ./broadcast.jl:682 [inlined]
 [4] #31
   @ ./broadcast.jl:1118 [inlined]
 [5] ntuple
   @ ./ntuple.jl:50 [inlined]
 [6] copy
   @ ./broadcast.jl:1118 [inlined]
 [7] materialize(bc::Base.Broadcast.Broadcasted{Base.Broadcast.Style{Tuple}, Nothing, typeof(getindex), Tuple{Base.RefValue{Dict{Symbol, Any}}, Tuple{Symbol, Symbol, Symbol}}})
   @ Base.Broadcast ./broadcast.jl:903
 [8] merge_datasets(dslist::Vector{YAXArrays.Datasets.Dataset})
   @ YAXArrays.Datasets ~/.julia/packages/YAXArrays/jdA1f/src/DatasetAPI/Datasets.jl:903
 [9] top-level scope
   @ In[15]:1

As far as I can tell, :Ti is present in both files here (and in all 3000 files I have), but somehow it does not seems to be able to pick it up.

(Climat) pkg> st

  [179af706] CFTime v0.1.3
  [a93c6f00] DataFrames v1.6.1
  [0703355e] DimensionalData v0.27.2
  [31c24e10] Distributions v0.25.108
  [85f8d34a] NCDatasets v0.14.4
  [30363a11] NetCDF v0.11.8
  [90b8fcef] YAXArrayBase v0.6.1
  [c21b50f5] YAXArrays v0.5.6
⌃ [0a941bbe] Zarr v0.9.3
  [ade2ca70] Dates
  [10745b16] Statistics v1.10.0

From Manifest

[fcd2136c] DiskArrayTools v0.1.10
⌅ [3c3547ce] DiskArrays v0.3.23
felixcremer commented 1 month ago

That is something that we should fix. As a stop gap you could extract all cubes from the dataset use cat(cubes..., dims=Ti) to merge them and wrap the concatenated cubes in a Dataset.

Balinus commented 1 month ago

ok, thanks, I'll see what I can do.

I didn't calculated correctly the number of files... it is 27_000 files.

I am doing the following, but I get a warning about lookup tables not matching (the order of the variable name are perhaps not sorted, creating a problem? -> see "t2m" and "lai_lv" in both lists)

cubes = Cube.(files[1:4])
ds2 = cat(cubes..., dims=:Ti);

Warning: Lookup values for Dim{:Variable} of 
["snowc", "e", "skt", "asn", "d2m", "stl1", "t2m", "lai_lv", "u10", "sro", "ssrd", "src", "v10", "lai_hv", "sp", "sd", "rsn", "evaow", "sde", "sf", "tp", "ro"] 
["snowc", "e", "skt", "asn", "d2m", "stl1", "lai_lv", "t2m", "u10", "sro", "ssrd", "src", "v10", "lai_hv", "sp", "sd", "rsn", "evaow", "sde", "sf", "tp", "ro"] do not match. Can't `cat` AbstractDimArray, applying to `parent` object.
└ @ DimensionalData.Dimensions [~/.julia/packages/DimensionalData/yZgLJ/src/Dimensions/primitives.jl:774](https://vscode-remote+ssh-002dremote-002bdl2594-002elogin.vscode-resource.vscode-cdn.net/gpfs/groups/gc095/dl2594/Codes/ExtractionsBassins/Notebooks/~/.julia/packages/DimensionalData/yZgLJ/src/Dimensions/primitives.jl:774)