ecmwf / cfgrib

A Python interface to map GRIB files to the NetCDF Common Data Model following the CF Convention using ecCodes
Apache License 2.0
407 stars 78 forks source link

ECMWF Forecast generated Grib data cannot be converted to NetCDF #232

Open whatnick opened 3 years ago

whatnick commented 3 years ago

I am using cfgrib with ecCodes v2.21.0 see below.

cfgrib selfcheck
Found: ecCodes v2.21.0.
Your system is ready.

And getting the following error converting ECMWF forecast data I have to NetCDF.

cfgrib to_netcdf data/A1D05011200050812001 
Traceback (most recent call last):
  File "/home/whatnick/.local/bin/cfgrib", line 8, in <module>
    sys.exit(cfgrib_cli())
  File "/usr/lib/python3/dist-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/whatnick/.local/lib/python3.6/site-packages/cfgrib/__main__.py", line 61, in to_netcdf
    ds = xr.open_dataset(inpaths[0], engine=engine)  # type: ignore
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/api.py", line 517, in open_dataset
    filename_or_obj, lock=lock, **backend_kwargs
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/cfgrib_.py", line 43, in __init__
    self.ds = cfgrib.open_file(filename, **backend_kwargs)
  File "/home/whatnick/.local/lib/python3.6/site-packages/cfgrib/dataset.py", line 713, in open_file
    index, read_keys=read_keys, time_dims=time_dims, extra_coords=extra_coords, **kwargs
  File "/home/whatnick/.local/lib/python3.6/site-packages/cfgrib/dataset.py", line 668, in build_dataset_components
    attributes = build_dataset_attributes(index, filter_by_keys, encoding)
  File "/home/whatnick/.local/lib/python3.6/site-packages/cfgrib/dataset.py", line 592, in build_dataset_attributes
    attributes = enforce_unique_attributes(index, GLOBAL_ATTRIBUTES_KEYS, filter_by_keys)
  File "/home/whatnick/.local/lib/python3.6/site-packages/cfgrib/dataset.py", line 273, in enforce_unique_attributes
    raise DatasetBuildError("multiple values for key %r" % key, key, fbks)
cfgrib.dataset.DatasetBuildError: multiple values for key 'edition'
iainrussell commented 3 years ago

This is an interesting one! It seems potentially useful to store the original GRIB edition in the NetCDF metadata, and as a global variable it needs to be unique, so I see the problem. On the other hand , it's not essential to have it, so I wonder if there's a way for a user to remove a key from GLOBAL_ATTRIBUTES_KEYS if this is the only thing that's preventing the NetCDF from being generated?

whatnick commented 3 years ago

I have made some progress after a bunch of searching and reading the docs. Copied to another Grib

grib_copy data/A1S05031800050618001 data_[typeOfLevel].grb

Verified there was only one file in the collection. Then listed contents of it

grib_ls data_surface.grb
data_surface.grb
edition      centre       typeOfLevel  level        dataDate     stepRange    dataType     shortName    packingType  gridType     
1            ecmf         surface      0            20210503     72           fc           i10fg        grid_simple  regular_ll  
1            ecmf         surface      0            20210503     72           fc           cp           grid_simple  regular_ll  
1            ecmf         surface      0            20210503     72           fc           100u         grid_simple  regular_ll  
1            ecmf         surface      0            20210503     72           fc           100v         grid_simple  regular_ll  
1            ecmf         surface      0            20210503     72           fc           lsp          grid_simple  regular_ll  
2            ecmf         surface      0            20210503     72           fc           ptype        grid_simple  regular_ll  
2            ecmf         surface      0            20210503     72           fc           tprate       grid_simple  regular_ll  
1            ecmf         surface      0            20210503     72           fc           hwbt0        grid_simple  regular_ll  
1            ecmf         surface      0            20210503     72           fc           hcct         grid_simple  regular_ll  
1            ecmf         surface      0            20210503     72           fc           crr          grid_simple  regular_ll  
1            ecmf         surface      0            20210503     72           fc           lsrr         grid_simple  regular_ll  
1            ecmf         surface      0            20210503     69-72        fc           mxtpr3       grid_simple  regular_ll  
1            ecmf         surface      0            20210503     69-72        fc           mntpr3       grid_simple  regular_ll  
13 of 13 messages in data_surface.grb

13 of 13 total messages in 1 files

I am mostly interested in CRR and LSRR for my use cases.

b8raoult commented 3 years ago

Created a pull request

din14970 commented 2 years ago

I have a probably related issue for data coming from the archived forecasts:

>>> ds = xarray.open_dataset(filename)

---------------------------------------------------------------------------
DatasetBuildError                         Traceback (most recent call last)
/tmp/ipykernel_9992/3594020638.py in <module>
      1 filename = "test_file_atmosphere_all.grib"
----> 2 ds = xarray.open_dataset(filename, backend_kwargs={'errors': 'ignore', 'filter_by_keys': {"typeOfLevel": "surface"}})

~/miniconda3/envs/env/lib/python3.8/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
    495 
    496     overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 497     backend_ds = backend.open_dataset(
    498         filename_or_obj,
    499         drop_variables=drop_variables,

~/miniconda3/envs/env/lib/python3.8/site-packages/cfgrib/xarray_plugin.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, lock, indexpath, filter_by_keys, read_keys, encode_cf, squeeze, time_dims, errors, extra_coords)
     98     ) -> xr.Dataset:
     99 
--> 100         store = CfGribDataStore(
    101             filename_or_obj,
    102             indexpath=indexpath,

~/miniconda3/envs/env/lib/python3.8/site-packages/cfgrib/xarray_plugin.py in __init__(self, filename, lock, **backend_kwargs)
     38             lock = ECCODES_LOCK
     39         self.lock = xr.backends.locks.ensure_lock(lock)  # type: ignore
---> 40         self.ds = dataset.open_file(filename, **backend_kwargs)
     41 
     42     def open_store_variable(self, var: dataset.Variable,) -> xr.Variable:

~/miniconda3/envs/env/lib/python3.8/site-packages/cfgrib/dataset.py in open_file(path, grib_errors, indexpath, filter_by_keys, read_keys, time_dims, extra_coords, **kwargs)
    717     index = open_fileindex(path, grib_errors, indexpath, index_keys, filter_by_keys=filter_by_keys)
    718     return Dataset(
--> 719         *build_dataset_components(
    720             index, read_keys=read_keys, time_dims=time_dims, extra_coords=extra_coords, **kwargs
    721         )

~/miniconda3/envs/env/lib/python3.8/site-packages/cfgrib/dataset.py in build_dataset_components(index, errors, encode_cf, squeeze, log, read_keys, time_dims, extra_coords)
    673         "encode_cf": encode_cf,
    674     }
--> 675     attributes = build_dataset_attributes(index, filter_by_keys, encoding)
    676     return dimensions, variables, attributes, encoding
    677 

~/miniconda3/envs/env/lib/python3.8/site-packages/cfgrib/dataset.py in build_dataset_attributes(index, filter_by_keys, encoding)
    597 def build_dataset_attributes(index, filter_by_keys, encoding):
    598     # type: (messages.FileIndex, T.Dict[str, T.Any], T.Dict[str, T.Any]) -> T.Dict[str, T.Any]
--> 599     attributes = enforce_unique_attributes(index, GLOBAL_ATTRIBUTES_KEYS, filter_by_keys)
    600     attributes["Conventions"] = "CF-1.7"
    601     if "GRIB_centreDescription" in attributes:

~/miniconda3/envs/env/lib/python3.8/site-packages/cfgrib/dataset.py in enforce_unique_attributes(index, attributes_keys, filter_by_keys)
    271                 fbk.update(filter_by_keys)
    272                 fbks.append(fbk)
--> 273             raise DatasetBuildError("multiple values for key %r" % key, key, fbks)
    274         if values and values[0] not in ("undef", "unknown"):
    275             attributes["GRIB_" + key] = values[0]

DatasetBuildError: multiple values for key 'edition'

The file contains a bunch of data variables. If I query those variables individually and direct them to individual files there is no problem opening any of the files. I need to extract the data arrays, convert them to dataframes and write them somewhere else. Making a single request for all variables seems more efficient though. Is there any quickfix around this issue?

Versions

python: 3.8.6 cfgrib: 0.9.9.1 xarray: 0.19.0 eccodes: 2.23.0 eccodes (python): 1.3.3