ecmwf / cfgrib

A Python interface to map GRIB files to the NetCDF Common Data Model following the CF Convention using ecCodes
Apache License 2.0
400 stars 77 forks source link

failed to set key 'endStep' in to_grib #289

Closed jiaobf closed 2 years ago

jiaobf commented 2 years ago

I read the temperature from GFS and tried to write the data to a new GRIB file .

from eccodes import *
import numpy as np
import xarray as xr 
from cfgrib.xarray_to_grib import to_grib

filename = "gfs.2022010500f24"
ds = xr.open_dataset(filename, engine='cfgrib',
                     backend_kwargs={'filter_by_keys': {'typeOfLevel': 'isobaricInhPa', 
                                                        'name'       : 'Temperature'}})

to_grib(ds, 'out1.grib', grib_keys={'edition': 2})

Running the script directly will result in an error

KeyError: 1641340800000000000

following #272 , it works with a warning :

failed to set key 'endStep' to numpy.timedelta64(24,'ns')

and the time information in the grib file is incorrect

1:0:d=2022010500:TMP Temperature [K]:1000 mb:anl:

which should be

542:117403445:d=2022010500:TMP Temperature [K]:1000 mb:24 hour fcst:

Environment: Python 3.8 from conda-forge on centos7

pandas 1.3.4 cfgrib 0.9.9.1 xarray 0.20.1

ChristopheLRTE commented 2 years ago

Hello, where could we find the gfs.2022010500f24 file ?

jiaobf commented 2 years ago

Hello, where could we find the gfs.2022010500f24 file ?

Thank you for your attention. GFS file can be download with

wget https://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/gfs.20220228/00/atmos/gfs.t00z.pgrb2.0p50.f024

in which gfs.YYYYMMDD/00 can be set to any day in the past week.

iainrussell commented 2 years ago

This should be fixed with the release of the next eccodes-python, which now accepts numpy.int64 for setting keys: https://github.com/ecmwf/eccodes-python/commits/develop You can either try the develop branch of eccodes-python, or wait for the release, which should be in a couple of days.

ChristopheLRTE commented 2 years ago

As mentioned in this issue #267 (but things changed a bit since ...) it fails when pandas function encounters timestamp (or timedelta) int/long values.

You can fix it (waiting for the new version of cfgrib) by updating the convert_label_indexer function in xarray/core/indexing.py file and adding the few lines as illustrated below (with # ADDED LINE comment) :

`def convert_label_indexer(index, label, index_name="", method=None, tolerance=None): """Given a pandas.Index and labels (e.g., from getitem) for one dimension, return an indexer suitable for indexing an ndarray along that dimension. If 'index' is a pandas.MultiIndex and depending on 'label', return a new pandas.Index or pandas.MultiIndex (otherwise return None). """ new_index = None

if isinstance(label, slice):
    if method is not None or tolerance is not None:
        raise NotImplementedError(
            "cannot use ''method'' argument if any indexers are slice objects"
        )
    indexer = index.slice_indexer(
        _sanitize_slice_element(label.start),
        _sanitize_slice_element(label.stop),
        _sanitize_slice_element(label.step),
    )
    if not isinstance(indexer, slice):
        # unlike pandas, in xarray we never want to silently convert a
        # slice indexer into an array indexer
        raise KeyError(
            "cannot represent labeled-based slice indexer for dimension "
            f"{index_name!r} with a slice over integer positions; the index is "
            "unsorted or non-unique"
        )

elif is_dict_like(label):
    is_nested_vals = _is_nested_tuple(tuple(label.values()))
    if not isinstance(index, pd.MultiIndex):
        raise ValueError(
            "cannot use a dict-like object for selection on "
            "a dimension that does not have a MultiIndex"
        )
    elif len(label) == index.nlevels and not is_nested_vals:
        indexer = index.get_loc(tuple(label[k] for k in index.names))
    else:
        for k, v in label.items():
            # index should be an item (i.e. Hashable) not an array-like
            if isinstance(v, Sequence) and not isinstance(v, str):
                raise ValueError(
                    "Vectorized selection is not "
                    "available along level variable: " + k
                )
        indexer, new_index = index.get_loc_level(
            tuple(label.values()), level=tuple(label.keys())
        )

        # GH2619. Raise a KeyError if nothing is chosen
        if indexer.dtype.kind == "b" and indexer.sum() == 0:
            raise KeyError(f"{label} not found")

elif isinstance(label, tuple) and isinstance(index, pd.MultiIndex):
    if _is_nested_tuple(label):
        indexer = index.get_locs(label)
    elif len(label) == index.nlevels:
        indexer = index.get_loc(label)
    else:
        indexer, new_index = index.get_loc_level(
            label, level=list(range(len(label)))
        )
else:
    label = (
        label
        if getattr(label, "ndim", 1) > 1  # vectorized-indexing
        else _asarray_tuplesafe(label)
    )
    if label.ndim == 0:
        # see https://github.com/pydata/xarray/pull/4292 for details
        label_value = label[()] if label.dtype.kind in "mM" else label.item()
        if isinstance(index, pd.MultiIndex):
            indexer, new_index = index.get_loc_level(label_value, level=0)
        elif isinstance(index, pd.CategoricalIndex):
            if method is not None:
                raise ValueError(
                    "'method' is not a valid kwarg when indexing using a CategoricalIndex."
                )
            if tolerance is not None:
                raise ValueError(
                    "'tolerance' is not a valid kwarg when indexing using a CategoricalIndex."
                )
            indexer = index.get_loc(label_value)
        else:
            if isinstance(index, pd.DatetimeIndex): # ADDED LINE
                indexer = index.get_loc(pd.Timestamp(label_value), method=method, tolerance=tolerance)  # ADDED LINE
            elif isinstance(index, pd.TimedeltaIndex):  # ADDED LINE
                indexer = index.get_loc(pd.Timedelta(label_value), method=method, tolerance=tolerance)  # ADDED LINE
            else:  # ADDED LINE
                indexer = index.get_loc(label_value, method=method, tolerance=tolerance)
    elif label.dtype.kind == "b":
        indexer = label
    else:
        if isinstance(index, pd.MultiIndex) and label.ndim > 1:
            raise ValueError(
                "Vectorized selection is not available along "
                "MultiIndex variable: " + index_name
            )
        indexer = get_indexer_nd(index, label, method, tolerance)
        if np.any(indexer < 0):
            raise KeyError(f"not all values found in index {index_name!r}")
return indexer, new_index`
jiaobf commented 2 years ago

This should be fixed with the release of the next eccodes-python, which now accepts numpy.int64 for setting keys: https://github.com/ecmwf/eccodes-python/commits/develop You can either try the develop branch of eccodes-python, or wait for the release, which should be in a couple of days.

I manually modified line 973 and line 2063 of gribapi/gribapi.py , but it didn't seem to work.

iainrussell commented 2 years ago

Also, make sure you're using cfgrib from the master branch - this fix was important but is not yet in a release: https://github.com/ecmwf/cfgrib/commit/3cc01e395cd66a376a279c0c603cbedcaeda7f50

jiaobf commented 2 years ago

Also, make sure you're using cfgrib from the master branch - this fix was important but is not yet in a release: 3cc01e3

It works!