ecmwf / cfgrib

A Python interface to map GRIB files to the NetCDF Common Data Model following the CF Convention using ecCodes
Apache License 2.0
408 stars 77 forks source link

Reading National Blend of Models (NBM) Data with cfgrib #364

Closed jlgutenson closed 10 months ago

jlgutenson commented 10 months ago

What happened?

Hello,

I am attempting to read NBM data using xarray and crgrib with the following code snippet:

import xarray as xr 

nc_ds = xr.open_dataset(
    grib_path,
    engine='cfgrib',
    backend_kwargs={
        'filter_by_keys': {
            'stepType': 'accum',
            'typeOfLevel': 'surface',
        }
    }
)

I am expecting this to return a data series containing cumulative surface data (e.g., total precipitation).

I am receiving the following error:

cfgrib.dataset.DatasetBuildError: multiple values for key 'subCentre'

Since I've seen others successfully utilize NBM data with xarray and cfgrib, I'm guessing that this is some sort of versioning issue with cfgrib and/or xarray.

Is there a particular version of cfgrib that effectively reads the current NBM data?

What are the steps to reproduce the bug?

Download a CONUS grib2 file from NOMADS.

Run the following code snippet:

import xarray as xr 

grib_path = "/full/path/to/grib2/file"

nc_ds = xr.open_dataset(
    grib_path,
    engine='cfgrib',
    backend_kwargs={
        'filter_by_keys': {
            'stepType': 'accum',
            'typeOfLevel': 'surface',
        }
    }
)

Version

0.9.10.4

Platform (OS and architecture)

Ubuntu 22.04.6 wsl utilizing Miniconda

Relevant log output

Traceback (most recent call last):
  File "/mnt/d/Gutenson_RATES/TWDB-FIF-LRGVDC/2023/Scripts/water_level_obs/scripts/precipitation_zonal_statistics.py", line 174, in <module>
    nbm(basin_shapefile, grib_file_directory, precip_csv, basin_precip_data_df=None)
  File "/mnt/d/Gutenson_RATES/TWDB-FIF-LRGVDC/2023/Scripts/water_level_obs/scripts/precipitation_zonal_statistics.py", line 108, in nbm
    nc_ds = xr.open_dataset(
  File "/home/jlgutenson/miniconda3/envs/aiml_py310_1/lib/python3.10/site-packages/xarray/backends/api.py", line 572, in open_dataset
    backend_ds = backend.open_dataset(
  File "/home/jlgutenson/miniconda3/envs/aiml_py310_1/lib/python3.10/site-packages/cfgrib/xarray_plugin.py", line 108, in open_dataset
    store = CfGribDataStore(
  File "/home/jlgutenson/miniconda3/envs/aiml_py310_1/lib/python3.10/site-packages/cfgrib/xarray_plugin.py", line 40, in __init__
    self.ds = opener(filename, **backend_kwargs)
  File "/home/jlgutenson/miniconda3/envs/aiml_py310_1/lib/python3.10/site-packages/cfgrib/dataset.py", line 784, in open_file
    return open_from_index(index, read_keys, time_dims, extra_coords, **kwargs)
  File "/home/jlgutenson/miniconda3/envs/aiml_py310_1/lib/python3.10/site-packages/cfgrib/dataset.py", line 726, in open_from_index
    dimensions, variables, attributes, encoding = build_dataset_components(
  File "/home/jlgutenson/miniconda3/envs/aiml_py310_1/lib/python3.10/site-packages/cfgrib/dataset.py", line 695, in build_dataset_components
    attributes = build_dataset_attributes(index, filter_by_keys, encoding)
  File "/home/jlgutenson/miniconda3/envs/aiml_py310_1/lib/python3.10/site-packages/cfgrib/dataset.py", line 619, in build_dataset_attributes
    attributes = enforce_unique_attributes(index, GLOBAL_ATTRIBUTES_KEYS, filter_by_keys)
  File "/home/jlgutenson/miniconda3/envs/aiml_py310_1/lib/python3.10/site-packages/cfgrib/dataset.py", line 277, in enforce_unique_attributes
    raise DatasetBuildError("multiple values for key %r" % key, key, fbks)
cfgrib.dataset.DatasetBuildError: multiple values for key 'subCentre'

Accompanying data

https://nomads.ncep.noaa.gov/pub/data/nccf/com/blend/prod/

Organisation

RATES, Inc.

iainrussell commented 10 months ago

Hi @jlgutenson, there are a lot of files in those folders, which have quite different contents! Could you be more specific about which one you tried so that I can try to replicate it please? (I tried one file, but it was ok.) Thanks, Iain

jlgutenson commented 10 months ago

Hi @iainrussell,

Sure thing, apologies for being too vague.

Here is one of the files I'm attempting to use:

https://nomads.ncep.noaa.gov/pub/data/nccf/com/blend/prod/blend.20240110/00/core/blend.t00z.core.f001.co.grib2

Thanks, Joseph

iainrussell commented 10 months ago

Hi @jlgutenson,

Thanks for that. In fact, that particular file still did not show the problem, but after trying a few, I found that blend.t00z.core.f018.co.grib2 did. The problem comes from a behaviour of cfgrib that is debatable regarding its usefulness, but in terms of the GRIB file itself, it has 'accum' parameters for both subCentres 9 and 14 from NCEP. cfgrib would like to construct an xarray with some global attributes including subCentre, and thus it should be unique among all the GRIB messages that it reads. We might want to relax that restriction, although it does have a purpose. What you can do is add another filter to use only subCentre 14 (subCentre 9 seems to be test data), so your code would now look like this:

nc_ds = xr.open_dataset(
    grib_path,
    engine='cfgrib',
    backend_kwargs={
        'filter_by_keys': {
            'stepType': 'accum',
            'typeOfLevel': 'surface',
            'subCentre': 14
        }
    }
)

That works for me - is that solution acceptable for you? Cheers, Iain

jlgutenson commented 10 months ago

Hey @iainrussell,

Thank you! That seems to be doing the trick.

How were you able to discern which subCentre is which?

Thanks, Joseph

iainrussell commented 10 months ago

Well, in fact, looking closer, I may have been mistaken because I did not look at all the messages. But I used Metview's GRIB Examiner to inspect the data (CodesUI is also available, and is basically Metview's examiner packaged as a standalone tool), and it seemed that all the fields from subCentre 8 also had productionStatusOfProcessedData=1 (Operational test products) - see screenshots. However, on closer inspection I can see that some of them are in fact operational - I extrapolated too soon! But at least in this case, all the fields for subCentre=8 are instant rather than 'accum'. If you really want to make sure you've got everything, this will load the data into multiple datasets without you having to specify the subCentre, and with this I noticed there is also one field from subCentre=9, although it is also an operational test product.

import cfgrib 

grib_path = "blend.t00z.core.f018.co.grib2"

nc_ds = cfgrib.open_datasets(
    grib_path,
    engine='cfgrib',
    backend_kwargs={
        'filter_by_keys': {
            'stepType': 'accum',
            'typeOfLevel': 'surface',
        }
    }
)
print(nc_ds)

image image

jlgutenson commented 10 months ago

Hey @iainrussell,

Thank you! I wasn't familiar with MetView. I've been using Panoply. Everything appears to working on my end. I'll close this "bug" out. Enjoy your weekend!

Thanks again, Joseph Gutenson