ECMWF Forecast generated Grib data cannot be converted to NetCDF

whatnick commented 3 years ago

I am using cfgrib with ecCodes v2.21.0 see below.

cfgrib selfcheck
Found: ecCodes v2.21.0.
Your system is ready.

And getting the following error converting ECMWF forecast data I have to NetCDF.

cfgrib to_netcdf data/A1D05011200050812001 
Traceback (most recent call last):
  File "/home/whatnick/.local/bin/cfgrib", line 8, in <module>
    sys.exit(cfgrib_cli())
  File "/usr/lib/python3/dist-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1066, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/home/whatnick/.local/lib/python3.6/site-packages/cfgrib/__main__.py", line 61, in to_netcdf
    ds = xr.open_dataset(inpaths[0], engine=engine)  # type: ignore
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/api.py", line 517, in open_dataset
    filename_or_obj, lock=lock, **backend_kwargs
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/cfgrib_.py", line 43, in __init__
    self.ds = cfgrib.open_file(filename, **backend_kwargs)
  File "/home/whatnick/.local/lib/python3.6/site-packages/cfgrib/dataset.py", line 713, in open_file
    index, read_keys=read_keys, time_dims=time_dims, extra_coords=extra_coords, **kwargs
  File "/home/whatnick/.local/lib/python3.6/site-packages/cfgrib/dataset.py", line 668, in build_dataset_components
    attributes = build_dataset_attributes(index, filter_by_keys, encoding)
  File "/home/whatnick/.local/lib/python3.6/site-packages/cfgrib/dataset.py", line 592, in build_dataset_attributes
    attributes = enforce_unique_attributes(index, GLOBAL_ATTRIBUTES_KEYS, filter_by_keys)
  File "/home/whatnick/.local/lib/python3.6/site-packages/cfgrib/dataset.py", line 273, in enforce_unique_attributes
    raise DatasetBuildError("multiple values for key %r" % key, key, fbks)
cfgrib.dataset.DatasetBuildError: multiple values for key 'edition'

iainrussell commented 3 years ago

This is an interesting one! It seems potentially useful to store the original GRIB edition in the NetCDF metadata, and as a global variable it needs to be unique, so I see the problem. On the other hand , it's not essential to have it, so I wonder if there's a way for a user to remove a key from GLOBAL_ATTRIBUTES_KEYS if this is the only thing that's preventing the NetCDF from being generated?

whatnick commented 3 years ago

I have made some progress after a bunch of searching and reading the docs. Copied to another Grib

grib_copy data/A1S05031800050618001 data_[typeOfLevel].grb

Verified there was only one file in the collection. Then listed contents of it

grib_ls data_surface.grb
data_surface.grb
edition      centre       typeOfLevel  level        dataDate     stepRange    dataType     shortName    packingType  gridType     
1            ecmf         surface      0            20210503     72           fc           i10fg        grid_simple  regular_ll  
1            ecmf         surface      0            20210503     72           fc           cp           grid_simple  regular_ll  
1            ecmf         surface      0            20210503     72           fc           100u         grid_simple  regular_ll  
1            ecmf         surface      0            20210503     72           fc           100v         grid_simple  regular_ll  
1            ecmf         surface      0            20210503     72           fc           lsp          grid_simple  regular_ll  
2            ecmf         surface      0            20210503     72           fc           ptype        grid_simple  regular_ll  
2            ecmf         surface      0            20210503     72           fc           tprate       grid_simple  regular_ll  
1            ecmf         surface      0            20210503     72           fc           hwbt0        grid_simple  regular_ll  
1            ecmf         surface      0            20210503     72           fc           hcct         grid_simple  regular_ll  
1            ecmf         surface      0            20210503     72           fc           crr          grid_simple  regular_ll  
1            ecmf         surface      0            20210503     72           fc           lsrr         grid_simple  regular_ll  
1            ecmf         surface      0            20210503     69-72        fc           mxtpr3       grid_simple  regular_ll  
1            ecmf         surface      0            20210503     69-72        fc           mntpr3       grid_simple  regular_ll  
13 of 13 messages in data_surface.grb

13 of 13 total messages in 1 files

I am mostly interested in CRR and LSRR for my use cases.

b8raoult commented 3 years ago

Created a pull request

din14970 commented 2 years ago

I have a probably related issue for data coming from the archived forecasts:

>>> ds = xarray.open_dataset(filename)

---------------------------------------------------------------------------
DatasetBuildError                         Traceback (most recent call last)
/tmp/ipykernel_9992/3594020638.py in <module>
      1 filename = "test_file_atmosphere_all.grib"
----> 2 ds = xarray.open_dataset(filename, backend_kwargs={'errors': 'ignore', 'filter_by_keys': {"typeOfLevel": "surface"}})

~/miniconda3/envs/env/lib/python3.8/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
    495 
    496     overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 497     backend_ds = backend.open_dataset(
    498         filename_or_obj,
    499         drop_variables=drop_variables,

~/miniconda3/envs/env/lib/python3.8/site-packages/cfgrib/xarray_plugin.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, lock, indexpath, filter_by_keys, read_keys, encode_cf, squeeze, time_dims, errors, extra_coords)
     98     ) -> xr.Dataset:
     99 
--> 100         store = CfGribDataStore(
    101             filename_or_obj,
    102             indexpath=indexpath,

~/miniconda3/envs/env/lib/python3.8/site-packages/cfgrib/xarray_plugin.py in __init__(self, filename, lock, **backend_kwargs)
     38             lock = ECCODES_LOCK
     39         self.lock = xr.backends.locks.ensure_lock(lock)  # type: ignore
---> 40         self.ds = dataset.open_file(filename, **backend_kwargs)
     41 
     42     def open_store_variable(self, var: dataset.Variable,) -> xr.Variable:

~/miniconda3/envs/env/lib/python3.8/site-packages/cfgrib/dataset.py in open_file(path, grib_errors, indexpath, filter_by_keys, read_keys, time_dims, extra_coords, **kwargs)
    717     index = open_fileindex(path, grib_errors, indexpath, index_keys, filter_by_keys=filter_by_keys)
    718     return Dataset(
--> 719         *build_dataset_components(
    720             index, read_keys=read_keys, time_dims=time_dims, extra_coords=extra_coords, **kwargs
    721         )

~/miniconda3/envs/env/lib/python3.8/site-packages/cfgrib/dataset.py in build_dataset_components(index, errors, encode_cf, squeeze, log, read_keys, time_dims, extra_coords)
    673         "encode_cf": encode_cf,
    674     }
--> 675     attributes = build_dataset_attributes(index, filter_by_keys, encoding)
    676     return dimensions, variables, attributes, encoding
    677 

~/miniconda3/envs/env/lib/python3.8/site-packages/cfgrib/dataset.py in build_dataset_attributes(index, filter_by_keys, encoding)
    597 def build_dataset_attributes(index, filter_by_keys, encoding):
    598     # type: (messages.FileIndex, T.Dict[str, T.Any], T.Dict[str, T.Any]) -> T.Dict[str, T.Any]
--> 599     attributes = enforce_unique_attributes(index, GLOBAL_ATTRIBUTES_KEYS, filter_by_keys)
    600     attributes["Conventions"] = "CF-1.7"
    601     if "GRIB_centreDescription" in attributes:

~/miniconda3/envs/env/lib/python3.8/site-packages/cfgrib/dataset.py in enforce_unique_attributes(index, attributes_keys, filter_by_keys)
    271                 fbk.update(filter_by_keys)
    272                 fbks.append(fbk)
--> 273             raise DatasetBuildError("multiple values for key %r" % key, key, fbks)
    274         if values and values[0] not in ("undef", "unknown"):
    275             attributes["GRIB_" + key] = values[0]

DatasetBuildError: multiple values for key 'edition'

The file contains a bunch of data variables. If I query those variables individually and direct them to individual files there is no problem opening any of the files. I need to extract the data arrays, convert them to dataframes and write them somewhere else. Making a single request for all variables seems more efficient though. Is there any quickfix around this issue?

Versions

python: 3.8.6 cfgrib: 0.9.9.1 xarray: 0.19.0 eccodes: 2.23.0 eccodes (python): 1.3.3

ecmwf / cfgrib

ECMWF Forecast generated Grib data cannot be converted to NetCDF #232

Versions