Open whatnick opened 3 years ago
This is an interesting one! It seems potentially useful to store the original GRIB edition in the NetCDF metadata, and as a global variable it needs to be unique, so I see the problem. On the other hand , it's not essential to have it, so I wonder if there's a way for a user to remove a key from GLOBAL_ATTRIBUTES_KEYS if this is the only thing that's preventing the NetCDF from being generated?
I have made some progress after a bunch of searching and reading the docs. Copied to another Grib
grib_copy data/A1S05031800050618001 data_[typeOfLevel].grb
Verified there was only one file in the collection. Then listed contents of it
grib_ls data_surface.grb
data_surface.grb
edition centre typeOfLevel level dataDate stepRange dataType shortName packingType gridType
1 ecmf surface 0 20210503 72 fc i10fg grid_simple regular_ll
1 ecmf surface 0 20210503 72 fc cp grid_simple regular_ll
1 ecmf surface 0 20210503 72 fc 100u grid_simple regular_ll
1 ecmf surface 0 20210503 72 fc 100v grid_simple regular_ll
1 ecmf surface 0 20210503 72 fc lsp grid_simple regular_ll
2 ecmf surface 0 20210503 72 fc ptype grid_simple regular_ll
2 ecmf surface 0 20210503 72 fc tprate grid_simple regular_ll
1 ecmf surface 0 20210503 72 fc hwbt0 grid_simple regular_ll
1 ecmf surface 0 20210503 72 fc hcct grid_simple regular_ll
1 ecmf surface 0 20210503 72 fc crr grid_simple regular_ll
1 ecmf surface 0 20210503 72 fc lsrr grid_simple regular_ll
1 ecmf surface 0 20210503 69-72 fc mxtpr3 grid_simple regular_ll
1 ecmf surface 0 20210503 69-72 fc mntpr3 grid_simple regular_ll
13 of 13 messages in data_surface.grb
13 of 13 total messages in 1 files
I am mostly interested in CRR and LSRR for my use cases.
Created a pull request
I have a probably related issue for data coming from the archived forecasts:
>>> ds = xarray.open_dataset(filename)
---------------------------------------------------------------------------
DatasetBuildError Traceback (most recent call last)
/tmp/ipykernel_9992/3594020638.py in <module>
1 filename = "test_file_atmosphere_all.grib"
----> 2 ds = xarray.open_dataset(filename, backend_kwargs={'errors': 'ignore', 'filter_by_keys': {"typeOfLevel": "surface"}})
~/miniconda3/envs/env/lib/python3.8/site-packages/xarray/backends/api.py in open_dataset(filename_or_obj, engine, chunks, cache, decode_cf, mask_and_scale, decode_times, decode_timedelta, use_cftime, concat_characters, decode_coords, drop_variables, backend_kwargs, *args, **kwargs)
495
496 overwrite_encoded_chunks = kwargs.pop("overwrite_encoded_chunks", None)
--> 497 backend_ds = backend.open_dataset(
498 filename_or_obj,
499 drop_variables=drop_variables,
~/miniconda3/envs/env/lib/python3.8/site-packages/cfgrib/xarray_plugin.py in open_dataset(self, filename_or_obj, mask_and_scale, decode_times, concat_characters, decode_coords, drop_variables, use_cftime, decode_timedelta, lock, indexpath, filter_by_keys, read_keys, encode_cf, squeeze, time_dims, errors, extra_coords)
98 ) -> xr.Dataset:
99
--> 100 store = CfGribDataStore(
101 filename_or_obj,
102 indexpath=indexpath,
~/miniconda3/envs/env/lib/python3.8/site-packages/cfgrib/xarray_plugin.py in __init__(self, filename, lock, **backend_kwargs)
38 lock = ECCODES_LOCK
39 self.lock = xr.backends.locks.ensure_lock(lock) # type: ignore
---> 40 self.ds = dataset.open_file(filename, **backend_kwargs)
41
42 def open_store_variable(self, var: dataset.Variable,) -> xr.Variable:
~/miniconda3/envs/env/lib/python3.8/site-packages/cfgrib/dataset.py in open_file(path, grib_errors, indexpath, filter_by_keys, read_keys, time_dims, extra_coords, **kwargs)
717 index = open_fileindex(path, grib_errors, indexpath, index_keys, filter_by_keys=filter_by_keys)
718 return Dataset(
--> 719 *build_dataset_components(
720 index, read_keys=read_keys, time_dims=time_dims, extra_coords=extra_coords, **kwargs
721 )
~/miniconda3/envs/env/lib/python3.8/site-packages/cfgrib/dataset.py in build_dataset_components(index, errors, encode_cf, squeeze, log, read_keys, time_dims, extra_coords)
673 "encode_cf": encode_cf,
674 }
--> 675 attributes = build_dataset_attributes(index, filter_by_keys, encoding)
676 return dimensions, variables, attributes, encoding
677
~/miniconda3/envs/env/lib/python3.8/site-packages/cfgrib/dataset.py in build_dataset_attributes(index, filter_by_keys, encoding)
597 def build_dataset_attributes(index, filter_by_keys, encoding):
598 # type: (messages.FileIndex, T.Dict[str, T.Any], T.Dict[str, T.Any]) -> T.Dict[str, T.Any]
--> 599 attributes = enforce_unique_attributes(index, GLOBAL_ATTRIBUTES_KEYS, filter_by_keys)
600 attributes["Conventions"] = "CF-1.7"
601 if "GRIB_centreDescription" in attributes:
~/miniconda3/envs/env/lib/python3.8/site-packages/cfgrib/dataset.py in enforce_unique_attributes(index, attributes_keys, filter_by_keys)
271 fbk.update(filter_by_keys)
272 fbks.append(fbk)
--> 273 raise DatasetBuildError("multiple values for key %r" % key, key, fbks)
274 if values and values[0] not in ("undef", "unknown"):
275 attributes["GRIB_" + key] = values[0]
DatasetBuildError: multiple values for key 'edition'
The file contains a bunch of data variables. If I query those variables individually and direct them to individual files there is no problem opening any of the files. I need to extract the data arrays, convert them to dataframes and write them somewhere else. Making a single request for all variables seems more efficient though. Is there any quickfix around this issue?
python: 3.8.6 cfgrib: 0.9.9.1 xarray: 0.19.0 eccodes: 2.23.0 eccodes (python): 1.3.3
I am using cfgrib with ecCodes v2.21.0 see below.
And getting the following error converting ECMWF forecast data I have to NetCDF.