ecmwf / cfgrib

A Python interface to map GRIB files to the NetCDF Common Data Model following the CF Convention using ecCodes
Apache License 2.0
408 stars 77 forks source link

Unable to merge multiple Grib files with specified variable name #380

Open meteoDaniel opened 5 months ago

meteoDaniel commented 5 months ago

What happened?

I am open a list of grib files (arome meteoe france SP2 grib packages ), and when I specify a shortName or the name of the variable, I receive a xarray.MergeError . But when I open multiple variable by just specifying {'stepType': 'instant'} all works fine.

This behaviour is very curious and i do not know how to debug this issue.

What are the steps to reproduce the bug?

In [2]: self.files_per_grib_package[grib_package_to_use]
Out[2]: 
[PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_0.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_1.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_2.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_3.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_4.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_5.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_6.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_7.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_8.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_9.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_10.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_11.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_12.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_13.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_14.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_15.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_16.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_17.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_18.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_19.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_20.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_21.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_22.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_23.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_24.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_25.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_26.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_27.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_28.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_29.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_30.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_31.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_32.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_33.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_34.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_35.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_36.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_37.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_38.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_39.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_40.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_41.grib2'),
 PosixPath('/app/data/arome_meteo_france/20240612_00/arome_meteo_france_20240612_00_SP2_42.grib2')]

In [3]: data = xarray.open_mfdataset(
   ...:                 self.files_per_grib_package[grib_package_to_use],
   ...:                 engine="cfgrib",
   ...:                 parallel=True,
   ...:                 concat_dim="step",
   ...:                 combine="nested",
   ...:                 backend_kwargs={
   ...:                     "indexpath": "",
   ...:                     "errors": "ignore",
   ...:                     "filter_by_keys":  {'shortName': 'lcc', 'stepType': 'instant'}
   ...:                     # "filter_by_keys": FILTER_ARGUMENT[variable],
   ...:                 },
   ...:             )
---------------------------------------------------------------------------
MergeError                                Traceback (most recent call last)
Cell In[3], line 1
----> 1 data = xarray.open_mfdataset(
      2                 self.files_per_grib_package[grib_package_to_use],
      3                 engine="cfgrib",
      4                 parallel=True,
      5                 concat_dim="step",
      6                 combine="nested",
      7                 backend_kwargs={
      8                     "indexpath": "",
      9                     "errors": "ignore",
     10                     "filter_by_keys":  {'shortName': 'lcc', 'stepType': 'instant'}
     11                     # "filter_by_keys": FILTER_ARGUMENT[variable],
     12                 },
     13             )

File /usr/local/lib/python3.10/site-packages/xarray/backends/api.py:1071, in open_mfdataset(paths, chunks, concat_dim, compat, preprocess, engine, data_vars, coords, combine, parallel, join, attrs_file, combine_attrs, **kwargs)
   1067 try:
   1068     if combine == "nested":
   1069         # Combined nested list by successive concat and merge operations
   1070         # along each dimension, using structure given by "ids"
-> 1071         combined = _nested_combine(
   1072             datasets,
   1073             concat_dims=concat_dim,
   1074             compat=compat,
   1075             data_vars=data_vars,
   1076             coords=coords,
   1077             ids=ids,
   1078             join=join,
   1079             combine_attrs=combine_attrs,
   1080         )
   1081     elif combine == "by_coords":
   1082         # Redo ordering from coordinates, ignoring how they were ordered
   1083         # previously
   1084         combined = combine_by_coords(
   1085             datasets,
   1086             compat=compat,
   (...)
   1090             combine_attrs=combine_attrs,
   1091         )

File /usr/local/lib/python3.10/site-packages/xarray/core/combine.py:356, in _nested_combine(datasets, concat_dims, compat, data_vars, coords, ids, fill_value, join, combine_attrs)
    353 _check_shape_tile_ids(combined_ids)
    355 # Apply series of concatenate or merge operations along each dimension
--> 356 combined = _combine_nd(
    357     combined_ids,
    358     concat_dims,
    359     compat=compat,
    360     data_vars=data_vars,
    361     coords=coords,
    362     fill_value=fill_value,
    363     join=join,
    364     combine_attrs=combine_attrs,
    365 )
    366 return combined

File /usr/local/lib/python3.10/site-packages/xarray/core/combine.py:232, in _combine_nd(combined_ids, concat_dims, data_vars, coords, compat, fill_value, join, combine_attrs)
    228 # Each iteration of this loop reduces the length of the tile_ids tuples
    229 # by one. It always combines along the first dimension, removing the first
    230 # element of the tuple
    231 for concat_dim in concat_dims:
--> 232     combined_ids = _combine_all_along_first_dim(
    233         combined_ids,
    234         dim=concat_dim,
    235         data_vars=data_vars,
    236         coords=coords,
    237         compat=compat,
    238         fill_value=fill_value,
    239         join=join,
    240         combine_attrs=combine_attrs,
    241     )
    242 (combined_ds,) = combined_ids.values()
    243 return combined_ds

File /usr/local/lib/python3.10/site-packages/xarray/core/combine.py:267, in _combine_all_along_first_dim(combined_ids, dim, data_vars, coords, compat, fill_value, join, combine_attrs)
    265     combined_ids = dict(sorted(group))
    266     datasets = combined_ids.values()
--> 267     new_combined_ids[new_id] = _combine_1d(
    268         datasets, dim, compat, data_vars, coords, fill_value, join, combine_attrs
    269     )
    270 return new_combined_ids

File /usr/local/lib/python3.10/site-packages/xarray/core/combine.py:290, in _combine_1d(datasets, concat_dim, compat, data_vars, coords, fill_value, join, combine_attrs)
    288 if concat_dim is not None:
    289     try:
--> 290         combined = concat(
    291             datasets,
    292             dim=concat_dim,
    293             data_vars=data_vars,
    294             coords=coords,
    295             compat=compat,
    296             fill_value=fill_value,
    297             join=join,
    298             combine_attrs=combine_attrs,
    299         )
    300     except ValueError as err:
    301         if "encountered unexpected variable" in str(err):

File /usr/local/lib/python3.10/site-packages/xarray/core/concat.py:250, in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
    238     return _dataarray_concat(
    239         objs,
    240         dim=dim,
   (...)
    247         combine_attrs=combine_attrs,
    248     )
    249 elif isinstance(first_obj, Dataset):
--> 250     return _dataset_concat(
    251         objs,
    252         dim=dim,
    253         data_vars=data_vars,
    254         coords=coords,
    255         compat=compat,
    256         positions=positions,
    257         fill_value=fill_value,
    258         join=join,
    259         combine_attrs=combine_attrs,
    260     )
    261 else:
    262     raise TypeError(
    263         "can only concatenate xarray Dataset and DataArray "
    264         f"objects, got {type(first_obj)}"
    265     )

File /usr/local/lib/python3.10/site-packages/xarray/core/concat.py:524, in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs)
    518 if variables_to_merge:
    519     grouped = {
    520         k: v
    521         for k, v in collect_variables_and_indexes(datasets).items()
    522         if k in variables_to_merge
    523     }
--> 524     merged_vars, merged_indexes = merge_collected(
    525         grouped, compat=compat, equals=equals
    526     )
    527     result_vars.update(merged_vars)
    528     result_indexes.update(merged_indexes)

File /usr/local/lib/python3.10/site-packages/xarray/core/merge.py:290, in merge_collected(grouped, prioritized, compat, combine_attrs, equals)
    288 variables = [variable for variable, _ in elements_list]
    289 try:
--> 290     merged_vars[name] = unique_variable(
    291         name, variables, compat, equals.get(name, None)
    292     )
    293 except MergeError:
    294     if compat != "minimal":
    295         # we need more than "minimal" compatibility (for which
    296         # we drop conflicting coordinates)

File /usr/local/lib/python3.10/site-packages/xarray/core/merge.py:144, in unique_variable(name, variables, compat, equals)
    141                 break
    143 if not equals:
--> 144     raise MergeError(
    145         f"conflicting values for variable {name!r} on objects to be combined. "
    146         "You can skip this check by specifying compat='override'."
    147     )
    149 if combine_method:
    150     for var in variables[1:]:

MergeError: conflicting values for variable 'valid_time' on objects to be combined. You can skip this check by specifying compat='override'.

In [4]: data = xarray.open_mfdataset(
   ...:                 self.files_per_grib_package[grib_package_to_use],
   ...:                 engine="cfgrib",
   ...:                 parallel=True,
   ...:                 concat_dim="step",
   ...:                 combine="nested",
   ...:                 backend_kwargs={
   ...:                     "indexpath": "",
   ...:                     "errors": "ignore",
   ...:                     "filter_by_keys":  {'stepType': 'instant'}
   ...:                     # "filter_by_keys": FILTER_ARGUMENT[variable],
   ...:                 },
   ...:             )

In [5]: data
Out[5]: 
<xarray.Dataset> Size: 5GB
Dimensions:     (step: 43, latitude: 1791, longitude: 2801)
Coordinates:
    time        datetime64[ns] 8B 2024-06-12
  * step        (step) timedelta64[ns] 344B 00:00:00 ... 1 days 18:00:00
    surface     float64 8B 0.0
  * latitude    (latitude) float64 14kB 55.4 55.39 55.38 ... 37.52 37.51 37.5
  * longitude   (longitude) float64 22kB -12.0 -11.99 -11.98 ... 15.99 16.0
    valid_time  (step) datetime64[ns] 344B 2024-06-12 ... 2024-06-13T18:00:00
    level       float64 8B 0.0
Data variables:
    sp          (step, latitude, longitude) float32 863MB dask.array<chunksize=(1, 1791, 2801), meta=np.ndarray>
    CAPE_INS    (step, latitude, longitude) float32 863MB dask.array<chunksize=(1, 1791, 2801), meta=np.ndarray>
    lcc         (step, latitude, longitude) float32 863MB dask.array<chunksize=(2, 1791, 2801), meta=np.ndarray>
    hcc         (step, latitude, longitude) float32 863MB dask.array<chunksize=(2, 1791, 2801), meta=np.ndarray>
    mcc         (step, latitude, longitude) float32 863MB dask.array<chunksize=(2, 1791, 2801), meta=np.ndarray>
    unknown     (step, latitude, longitude) float32 863MB dask.array<chunksize=(2, 1791, 2801), meta=np.ndarray>
Attributes:
    GRIB_edition:            2
    GRIB_centre:             lfpw
    GRIB_centreDescription:  French Weather Service - Toulouse
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             French Weather Service - Toulouse
    history:                 2024-06-12T05:53 GRIB to CDM+CF via cfgrib-0.9.1...

In [6]:

Version

0.9.12.0

Platform (OS and architecture)

python3.10-slim Docker image

Relevant log output

No response

Accompanying data

https://mf-models-on-aws.org/#arome-france-hd/v1/2024-06-12/00/SP2/

Organisation

No response