ecmwf / cfgrib

A Python interface to map GRIB files to the NetCDF Common Data Model following the CF Convention using ecCodes
Apache License 2.0
400 stars 77 forks source link

Loading dataset failure for ECMWF reforecasts with filtering #256

Closed jodemaey closed 2 years ago

jodemaey commented 3 years ago

Hi,

Just to let you know that the current command to load and filter a reforecasts file download from ECMWF Mars fails:

import xarray as xr
import cfgrib as cf
dss = xr.open_dataset('reforecasts_ensemble_2019-10-31.grb', engine='cfgrib', backend_kwargs={'filter_by_keys': {'dataDate': '19991031', 'perturbationNumber': 1}})
Traceback (most recent call last):
  File "/home/jodemaey/anaconda3/envs/rmi_utils/lib/python3.7/site-packages/IPython/core/interactiveshell.py", line 3441, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-4-555f0fafb340>", line 1, in <module>
    dss = xr.open_dataset('reforecasts_ensemble_2019-10-31.grb', engine='cfgrib', backend_kwargs={'filter_by_keys': {'dataDate': '19991031', 'perturbationNumber': 1}})
  File "/home/jodemaey/anaconda3/envs/rmi_utils/lib/python3.7/site-packages/xarray/backends/api.py", line 501, in open_dataset
    **kwargs,
  File "/home/jodemaey/anaconda3/envs/rmi_utils/lib/python3.7/site-packages/cfgrib/xarray_plugin.py", line 108, in open_dataset
    errors=errors,
  File "/home/jodemaey/anaconda3/envs/rmi_utils/lib/python3.7/site-packages/cfgrib/xarray_plugin.py", line 40, in __init__
    self.ds = dataset.open_file(filename, **backend_kwargs)
  File "/home/jodemaey/anaconda3/envs/rmi_utils/lib/python3.7/site-packages/cfgrib/dataset.py", line 713, in open_file
    index, read_keys=read_keys, time_dims=time_dims, extra_coords=extra_coords, **kwargs
  File "/home/jodemaey/anaconda3/envs/rmi_utils/lib/python3.7/site-packages/cfgrib/dataset.py", line 623, in build_dataset_components
    for param_id in index["paramId"]:
  File "/home/jodemaey/anaconda3/envs/rmi_utils/lib/python3.7/site-packages/cfgrib/messages.py", line 391, in __getitem__
    return self.header_values[item]
KeyError: 'paramId'
xr.__version__
Out[5]: '0.19.0'
cf.__version__
Out[6]: '0.9.9.0'

However, loading the full file works and gives:

ds = cf.open_datasets('reforecasts_ensemble_2019-10-31.grb')
ds
Out[8]: 
[<xarray.Dataset>
 Dimensions:     (number: 10, time: 20, step: 184, values: 500)
 Coordinates:
   * number      (number) int64 1 2 3 4 5 6 7 8 9 10
   * time        (time) datetime64[ns] 1999-10-31 2000-10-31 ... 2018-10-31
   * step        (step) timedelta64[ns] 0 days 06:00:00 ... 46 days 00:00:00
     surface     float64 0.0
     latitude    (values) float64 51.94 51.94 51.94 51.94 ... 49.13 49.13 49.13
     longitude   (values) float64 2.025 2.25 2.475 2.7 ... 6.25 6.458 6.667 6.875
     valid_time  (time, step) datetime64[ns] 1999-10-31T06:00:00 ... 2018-12-16
 Dimensions without coordinates: values
 Data variables:
     mx2t6       (number, time, step, values) float32 ...
     mn2t6       (number, time, step, values) float32 ...
     p10fg6      (number, time, step, values) float32 ...
     t2m         (number, time, step, values) float32 ...
 Attributes:
     GRIB_edition:            1
     GRIB_centre:             ecmf
     GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
     GRIB_subCentre:          0
     Conventions:             CF-1.7
     institution:             European Centre for Medium-Range Weather Forecasts]

Eccodes version is 2.23.0.

iainrussell commented 3 years ago

Hi @jodemaey , this was a subtle one, but after a little while I realised what was going on (using two of my own GRIB files from MARS including a reforecast and a non-reforecast - the behaviour was the same). You need to specify the dataData as a number, not a string, i.e.

dss = xr.open_dataset('reforecasts_ensemble_2019-10-31.grb', engine='cfgrib',
      backend_kwargs={'filter_by_keys': {'dataDate': 19991031, 'perturbationNumber': 1}})

I didn't look deeply into why the error message was so unhelpful, but the filter was not matching any fields because dataDate is extracted from the GRIB header as a number, and it was being compared with your string and of course it was always False!

I hope this helps (and that it works for you). Best regards, Iain