Read some variables by 'filter_by_keys'

wangrenz commented 4 years ago

Hi,

I want to read GFS u,v variables, like:

ds = xr.open_mfdataset(path_list, concat_dim='valid_time', combine='nested', engine='cfgrib',
                                   backend_kwargs={ 'filter_by_keys':{ 'cfVarName': ['u','v'],  'typeOfLevel':'isobaricInhPa'},'indexpath':''})

But, it report an error

Traceback (most recent call last):
  File "read_uv.py", line 20, in _read
    backend_kwargs={ 'filter_by_keys':{ 'cfVarName': ['u','v'],  'typeOfLevel':'isobaricInhPa'},'indexpath':''})
  File "/home/miniconda3/envs/metenv/lib/python3.7/site-packages/xarray/backends/api.py", line 908, in open_mfdataset
    datasets = [open_(p, **open_kwargs) for p in paths]
  File "/home/miniconda3/envs/metenv/lib/python3.7/site-packages/xarray/backends/api.py", line 908, in <listcomp>
    datasets = [open_(p, **open_kwargs) for p in paths]
  File "/home/miniconda3/envs/metenv/lib/python3.7/site-packages/xarray/backends/api.py", line 520, in open_dataset
    filename_or_obj, lock=lock, **backend_kwargs
  File "/home/miniconda3/envs/metenv/lib/python3.7/site-packages/xarray/backends/cfgrib_.py", line 43, in __init__
    self.ds = cfgrib.open_file(filename, **backend_kwargs)
  File "/home/miniconda3/envs/metenv/lib/python3.7/site-packages/cfgrib/dataset.py", line 641, in open_file
    return Dataset(*build_dataset_components(index, read_keys=read_keys, **kwargs))
  File "/home/miniconda3/envs/metenv/lib/python3.7/site-packages/cfgrib/dataset.py", line 563, in build_dataset_components
    for param_id in index['paramId']:
  File "/home/miniconda3/envs/metenv/lib/python3.7/site-packages/cfgrib/messages.py", line 359, in __getitem__
    return self.header_values[item]
KeyError: 'paramId'

How to read only multiple variables？

alexamici commented 4 years ago

@wangrenz sorry for the late reply, I think you are hitting the problem with the MULTI-FILED message that I'm tracking with #111 (cfgrib has problem accessing the v-component of GFS).

I just committed a tentative fix in branch stable/0.9.8.x, if you want to try it out.

alexamici commented 4 years ago

I "think" the root cause of this particular issue, is not being able to access the v-component of a MULTI-FIELD message, so I "think" the issue is actually resolved by the release of version 0.9.8.2.

Feel free to reopen it if it still persist, but I would need some more detail in case.

wangrenz commented 4 years ago

I "think" the root cause of this particular issue, is not being able to access the v-component of a MULTI-FIELD message, so I "think" the issue is actually resolved by the release of version 0.9.8.2.

Feel free to reopen it if it still persist, but I would need some more detail in case.

Thanks for your work.

But I still have same problem of only read u, v variables. The error message is the same as above.

Is not support key of cfVarName ?

Regards.

fcannini commented 4 years ago

Hi there. I'm having the same issue with these grib2 files

Trying to read solely the tp variable like this:

wrf = xa.open_dataset('WRF_cpt_05KM_2020071400_2020071400.grib2', engine='cfgrib', backend_kwargs={'filter_by_keys': {'typeOfLevel': 'surface', 'cfVarName': ['tp'] } } )

The error message is pretty much the same as the one @wangrenz gets:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-11-9898da9e54c5> in <module>()
----> 1 wrf = xa.open_dataset('WRF_cpt_05KM_2020071400_2020071400.grib2'', engine='cfgrib', backend_kwargs={'filter_by_keys': {'typeOfLevel': 'surface', 'cfVarName': ['tp'] } } )
/usr/lib/python3/dist-packages/xarray/backends/api.py in open_dataset(filename_or_obj, group, decode_cf, mask_and_scale, decode_times, autoclose, concat_characters, decode_coords, engine, chunks, lock, cache, drop_variables, backend_kwargs, use_cftime)
435         elif engine == 'cfgrib':
436             store = backends.CfGribDataStore(
--> 437                 filename_or_obj, lock=lock, **backend_kwargs)
438
439     else:
/usr/lib/python3/dist-packages/xarray/backends/cfgrib_.py in __init__(self, filename, lock, **backend_kwargs)
38             lock = ECCODES_LOCK
39         self.lock = ensure_lock(lock)
---> 40         self.ds = cfgrib.open_file(filename, **backend_kwargs)
41
42     def open_store_variable(self, name, var):
/home/fcannini/.local/lib/python3.7/site-packages/cfgrib/dataset.py in open_file(path, grib_errors, indexpath, filter_by_keys, read_keys, **kwargs)
649     index_keys = sorted(ALL_KEYS + read_keys)
650     index = open_fileindex(path, grib_errors, indexpath, index_keys).subindex(filter_by_keys)
--> 651     return Dataset(*build_dataset_components(index, read_keys=read_keys, **kwargs))
/home/fcannini/.local/lib/python3.7/site-packages/cfgrib/dataset.py in build_dataset_components(index, errors, encode_cf, squeeze, log, read_keys, time_dims)
571     variables = collections.OrderedDict()
572     filter_by_keys = index.filter_by_keys
--> 573     for param_id in index['paramId']:
574         var_index = index.subindex(paramId=param_id)
575         try:
/home/fcannini/.local/lib/python3.7/site-packages/cfgrib/messages.py in __getitem__(self, item)
385     def __getitem__(self, item):
386         # type: (str) -> list
--> 387         return self.header_values[item]
388
389     def getone(self, item):
KeyError: 'paramId'

I also tried to upgrade to 0.9.8.3 but no success.

shahramn commented 4 years ago

Can you please try the key shortName instead of cfVarName: {'typeOfLevel': 'surface', 'shortName': ['tp'] }

fcannini commented 4 years ago

@shahramn I've tried it too, same error.

fcannini commented 4 years ago

@wangrenz @shahramn I've managed to successfully open the files mentioned in my first comment by using solely backend_kwargs={'filter_by_keys' : {'shortName': 'tp'} }.

Strangely, in my case, using backend_kwargs={'filter_by_keys': {'typeOfLevel': 'surface'} } did not show the variable I'm interested, so I tried the above.

wangrenz commented 4 years ago

@fcannini Yes， only one variable can be read.

e.g

ds_u = xr.open_mfdataset(path_list, concat_dim='valid_time', combine='nested', engine='cfgrib',
                                   backend_kwargs={ 'filter_by_keys':{ 'cfVarName': 'u',  'typeOfLevel':'isobaricInhPa'},'indexpath':''})
ds_v = xr.open_mfdataset(path_list, concat_dim='valid_time', combine='nested', engine='cfgrib',
                                   backend_kwargs={ 'filter_by_keys':{ 'cfVarName': 'v',  'typeOfLevel':'isobaricInhPa'},'indexpath':''})

This can be successfully read.

phigre commented 5 months ago

Hi there,

thanks, this discussion helped me to get my code for reading a subset of variables working. However, it would be convenient and probably also more performant to be able to supply a list of values per filter key instead of opening a dataset for every single value and merging them afterwards ('cfVarName': ['u', 'v']). Are there any plans on supporting this in the future? If this is interesting to you, I could also try to provide an implementation for it.

DWesl commented 1 month ago

The current behavior works for reading single variables, but I'm used to using cfgrib to read multiple variables at a time, which works well when everything agrees on the date and has the same set of pressure levels.

If a few variables are missing a few pressure levels for whatever reason, cfgrib picks a subset of variables based on what pressure levels were present in the first variable it found. Usually the data is stored top-down, so this includes the variables on the most pressure levels, but not those only present on the lower ones (troposphere-only variables like aircraft icing potential or liquid water mixing ratio are the most likely). I understand there are complications with adding variables on pressure levels not a subset of those already present, but allowing variables only available on the bottom half of the levels found, or with these three levels disappeared into the ether, should be somewhat straightforward.

Another option is, as mentioned earlier, specifying a list of shortName and checking whether the short name is in that list rather than checking whether the short name equals a string, which seems like a nice straightforward PR if the traceback points close to the implementation.

EDIT: A wrapper function to do the looping over the variables does work, although it complains that the index file for the broader query is invalid for the narrow queries.

chriss1245 commented 1 week ago

I am facing the same need. To be honest I would love to see that opening a file through a list of attributes specified by shortName or by paramId

ecmwf / cfgrib

Read some variables by 'filter_by_keys' #138