JuliaDataCubes / YAXArrays.jl

Yet Another XArray-like Julia package
https://juliadatacubes.github.io/YAXArrays.jl/
Other
87 stars 12 forks source link

Grib support #394

Open aasdelat opened 1 month ago

aasdelat commented 1 month ago

Does YAXArrays support grib files?. If so, how can I load one? If not, could this feature be added? Thanks a lot.

Balinus commented 1 month ago

Not a solution, but I very much dislike grib files, so I usually convert them to netcdf or zarr files. Even xarray has in my experience limited support for grib files (it doesn't work with 50% of the files I'd like to read).

https://www.ncl.ucar.edu/Applications/griball.shtml https://confluence.ecmwf.int/display/LDAS/Converting+from+grib+to+netCDF+with+cdo

felixcremer commented 1 month ago

YAXArrays.jl itself does not support grib files but Rasters does. And from a Raster object you can then convert that back into a YAXArray object if you want to use the YAXArrays machinery.

using Rasters, GRIBDatasets
using YAXArrays

r = Raster("path/to/gribfile", lazy=true)
yax = YAXArray(dims(r), parent(r))

This is a bit of a stop gap solution because you would loose the metadata of your file on that but it might help you in starting your analysis.

aasdelat commented 1 month ago

Thanks, both of you. I have tested @felixcremer s idea, but got a weird structure for the data. So I've also tested @Balinus s idea, but It seems that YAXArrays does not support the structure of my data. Here is the structure of my data after converted into NetCDF, obtained from the head of ncdump:

dimensions:
        values = 338880 ;
        valid_time = 744 ;
variables:
        int64 valid_time(valid_time) ;
                valid_time:units = "seconds since 1970-01-01T00:00:00" ;
                valid_time:calendar = "proleptic_gregorian" ;
                valid_time:long_name = "time" ;
                valid_time:standard_name = "time" ;
        double longitude(values) ;
                longitude:units = "degrees_east" ;
                longitude:long_name = "longitude" ;
                longitude:standard_name = "longitude" ;
        double latitude(values) ;
                latitude:units = "degrees_north" ;
                latitude:long_name = "latitude" ;
                latitude:standard_name = "latitude" ;
        double msl(valid_time, values) ;
                msl:name = "Mean sea level pressure" ;
                msl:units = "Pa" ;

This is what I get when I try to load the file with YAXArrays:

julia> ds = Cube("nc_file.nc")
ERROR: ArgumentError: invalid index: nothing of type Nothing

Any ideas?

felixcremer commented 1 month ago

Could you please show the full stacktrace of your error?

felixcremer commented 1 month ago

Could you please show the full stacktrace of you error?

You can also try the open_dataset function to open the file as a dataset and select the relevant data from there.

aasdelat commented 1 month ago

I have tried open_dataset and it works! I do not know the difference between Cube() and open_dataset(). But I need to open a lot of files into a singe dataset, How can I do this? Anyway, I copy and paste the output error of Cube():

ds = Cube("nc_file.nc")
ERROR: ArgumentError: invalid index: nothing of type Nothing
Stacktrace:
  [1] to_index(i::Nothing)
    @ Base ./indices.jl:300
  [2] to_index(A::Vector{Any}, i::Nothing)
    @ Base ./indices.jl:277
  [3] _to_indices1(A::Vector{Any}, inds::Tuple{Base.OneTo{Int64}}, I1::Nothing)
    @ Base ./indices.jl:359
  [4] to_indices
    @ ./indices.jl:354 [inlined]
  [5] to_indices
    @ ./indices.jl:345 [inlined]
  [6] getindex
    @ ./abstractarray.jl:1291 [inlined]
  [7] concatenatecubes(cl::Vector{Any}, cataxis::Dim{:Variable, Vector{String}})
    @ YAXArrays.Cubes ~/.julia/packages/YAXArrays/jdA1f/src/Cubes/TransformedCubes.jl:39
  [8] Cube(ds::Dataset; joinname::String, target_type::Nothing)
    @ YAXArrays.Datasets ~/.julia/packages/YAXArrays/jdA1f/src/DatasetAPI/Datasets.jl:382
  [9] Cube
    @ ~/.julia/packages/YAXArrays/jdA1f/src/DatasetAPI/Datasets.jl:341 [inlined]
 [10] Cube(s::String)
    @ YAXArrays.Datasets ~/.julia/packages/YAXArrays/jdA1f/src/DatasetAPI/Datasets.jl:815
 [11] top-level scope
    @ REPL[4]:1
danlooo commented 1 month ago

A Dataset is just a dictionary of YAXArrays. These arrays may have shared axes, e.g. longitude and latitude for arrays temperature and precipitation. These arrays may also be completely unrelated. The type of the cell elements may also be different. A Cube is just one YAXArray in which all variables from the dataset are squeezed together. If temperature and precipitation do not share all axes e.g. they are in a different grid, one can not make a Cube out of it. Maybe you're files have some different axes so you just use a Dataset instead of a Cube.

aasdelat commented 1 month ago

Wow!, I understand now. I have two shared axes: "values" and "valid_time" and three YAXArrays: "longitude", "latitude" and "msl". So it is a data set, but not a Cube. On the other hand, I have tried to read it as a Cube by adding the following attribute to the "msl" array: msl:coordinates = "longitude latitude" ; but with no success. So, I think I have to read it as a dataset. But, Can I read a set of files as a single dataset (as can be made by Python's xarray.open_mfdataset)?

lazarusA commented 1 month ago

That's what 'open_dataset' will do also here. Similar to https://juliadatacubes.github.io/YAXArrays.jl/dev/UserGuide/openZarr.

felixcremer commented 1 month ago

Could you share two of your files, so that I can debug this. We also have an open_mfdataset function but it is a bit buggy.

aasdelat commented 1 month ago

Here are two of them. The first one has the msl:coordinates = "longitude latitude" ; attribute on the msl array and the second one, not. They are in a zip file 580 Mb, so I send them by wetransfer: https://we.tl/t-v5IqMyWDG8