ecmwf / cfgrib

A Python interface to map GRIB files to the NetCDF Common Data Model following the CF Convention using ecCodes
Apache License 2.0
406 stars 78 forks source link

Dimension mismatch in MARS data #398

Open juntyr opened 2 months ago

juntyr commented 2 months ago

What happened?

xarray failed to open a GRIB file with xarray, erroring with a dimension mismatch

What are the steps to reproduce the bug?

import xarray as xr
xr.open_dataset("_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib", engine="cfgrib")

Version

0.9.14.0

Platform (OS and architecture)

MacOS, also occurs on Pyodide

Relevant log output

ecCodes provides no latitudes/longitudes for gridType='sh'
skipping variable: paramId==133 shortName='q'
Traceback (most recent call last):
  File "venv/lib/python3.10/site-packages/cfgrib/dataset.py", line 723, in build_dataset_components
    dict_merge(dimensions, dims)
  File "venv/lib/python3.10/site-packages/cfgrib/dataset.py", line 639, in dict_merge
    raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='values' value=1639680 new_value=6599680
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "venv/lib/python3.10/site-packages/xarray/backends/api.py", line 588, in open_dataset
    backend_ds = backend.open_dataset(
  File "venv/lib/python3.10/site-packages/cfgrib/xarray_plugin.py", line 141, in open_dataset
    ds = xr.Dataset(vars, attrs=attrs)
  File "venv/lib/python3.10/site-packages/xarray/core/dataset.py", line 713, in __init__
    variables, coord_names, dims, indexes, _ = merge_data_and_coords(
  File "venv/lib/python3.10/site-packages/xarray/core/dataset.py", line 427, in merge_data_and_coords
    return merge_core(
  File "venv/lib/python3.10/site-packages/xarray/core/merge.py", line 705, in merge_core
    dims = calculate_dimensions(variables)
  File "venv/lib/python3.10/site-packages/xarray/core/variable.py", line 3009, in calculate_dimensions
    raise ValueError(
ValueError: conflicting sizes for dimension 'values': length 6599680 on 'latitude' and length 1639680 on {'step': 'step', 'hybrid': 'hybrid', 'values': 't'}

Accompanying data

https://faubox.rrze.uni-erlangen.de/dl/fiVj21QV6ihsyWC8UEZYTT/_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib

Organisation

University of Helsinki

juntyr commented 2 months ago

CC @SF-N

iainrussell commented 2 months ago

Hi @juntyr,

The reason for the problem is that there are two different variables here, whose geographical coordinates do not match (in fact q is on a reduced Gaussian grid, and t is a spectral field, not on a grid at all). Therefore they cannot form a nice hypercube.

% grib_ls ./_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib
./_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib
edition      centre       date         dataType     gridType     stepRange    typeOfLevel  level        shortName    packingType
2            ecmf         20240811     cf           sh           354          hybrid       1            t            spectral_complex
2            ecmf         20240811     cf           reduced_gg   354          hybrid       1            q            grid_ccsds
2            ecmf         20240811     cf           sh           354          hybrid       2            t            spectral_complex
2            ecmf         20240811     cf           reduced_gg   354          hybrid       2            q            grid_ccsds
2            ecmf         20240811     cf           sh           360          hybrid       1            t            spectral_complex
2            ecmf         20240811     cf           reduced_gg   360          hybrid       1            q            grid_ccsds
2            ecmf         20240811     cf           sh           360          hybrid       2            t            spectral_complex
2            ecmf         20240811     cf           reduced_gg   360          hybrid       2            q            grid_ccsds
8 of 8 messages in ./_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib

You can, however, use a bit of built-in functionality from cfgrib to split the data into two datasets - one for each variable:

import cfgrib
ds = cfgrib.open_datasets('_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib')

Alternatively, to get more control, you can use the backend kwargs to load just selected fields according to their properties, e.g.

fname = "_mars-bol-webmars-private-svc-blue-007-4a73a881a8d5eead47db9eff2f9935a4-LEW9gw.grib"
ds = xr.open_dataset(fname, engine="cfgrib", backend_kwargs={'filter_by_keys': {'gridType': 'reduced_gg'}})

I hope this helps!