Open alexamici opened 6 years ago
I was thinking to use cfgrib to convert a lot of grib files into a big xarray and save it all to zarr. I would really benefit of having this feature, because it will save me from the intermediate converting grib files into netcdf to be later processed by xarray. Any info on when approximately this will be available?
@aolt we intend to prepare a Pull Request to add GRIB support via cfgrib to xarry. If and when this is accepted you will be able to use the xarray.open_mfdataset
API directly.
I have no ETA yet, but becoming a first class driver in xarray is one of the main targets of the project.
A cfgrib backend has just been included in xarray:
https://github.com/pydata/xarray/pull/2476
With the upcoming v0.11 you will be able to:
>>> ds = xr.open_mfdataset(['file1.grib', 'file2.grib'], engine='cfgrib', concat_dim='step')
Great! It works fine with small files, but I get "Memory Error" on many big files. Is it possible to make it working the same way NetCDF backend works with "lazy" read?
>>> xr.__version__
'0.11.0'
pip list |grep cfgrib
cfgrib 0.9.3.1
python -m cfgrib selfcheck
Found: ecCodes v2.6.0.
Your system is ready.
python -V
Python 3.7.0
@aolt the theory was that everything was lazy already... but in practice I noticed yesterday a really dumb bug that was loading the whole dataset into memory unconditionally at open 🤦♂️
The bug is fixed in version 0.9.4, please upgrade and try again.
I'm currently running a mean on 320Gb of GRIB files on 10 dask.distributed
nodes, so I'm confident it's working now :)
Even if there is some merit in opening several GRIB files as a single cfgrib.Dataset
I'm changing this to wontfix
as xarray.open_mfdataset
is what almost everybody really wants.
Hello, I have a quick question regarding this topic. I notice that cfgrib has the following ability:
cfgrib also provides a function that automate the selection of appropriate filter_by_keys and returns a list of all valid xarray.Dataset's in the GRIB file using the cfgrib.open_datasets().
I wanted to ask if this is only for a single grib file, or if it is possible to supply a path that will create the datasets similar to xarray.open_mfdataset(). I'm looking for the ability to automate the selection of filter_by_keys with the ability to open multiple files to create the datasets.
Thanks!
EDIT: it appears that cfgrib.open_datasets() only handles one grib file at a time. however, if you have a directory of grib files, you can "cat" these files into one grib file and then read with cfgrib.open_datasets()
At low level we use an explicit file
path
and fileoffset
in several places.Note that
xr.open_mfdataset
handles opening and merging of multiple files without any additional support from the low-level driver so this feature is low priority.