ecmwf / cfgrib

A Python interface to map GRIB files to the NetCDF Common Data Model following the CF Convention using ecCodes
Apache License 2.0
407 stars 78 forks source link

Cfgrib only reads some variables #245

Open jsillin opened 3 years ago

jsillin commented 3 years ago

Hello, I am trying to open some HRRR grib files to extract smoke forecast information. The files are available on NOMADS or GCP and should all be the same regardless of validtime https://nomads.ncep.noaa.gov/pub/data/nccf/com/hrrr/prod/. I'm working with the conus wrfnat files.

I'm opening these files with cfgrib and 'filter_by_keys':

import xarray as xr
ds = xr.open_dataset('hrrr.t00z.wrfnatf12.grib2',engine='cfgrib',filter_by_keys={'typeOfLevel': 'hybrid'})

This produces a nice xarray dataset:

<xarray.Dataset>
Dimensions:     (hybrid: 50, x: 1799, y: 1059)
Coordinates:
    time        datetime64[ns] ...
    step        timedelta64[ns] ...
  * hybrid      (hybrid) float64 1.0 2.0 3.0 4.0 5.0 ... 47.0 48.0 49.0 50.0
    latitude    (y, x) float64 ...
    longitude   (y, x) float64 ...
    valid_time  datetime64[ns] ...
Dimensions without coordinates: x, y
Data variables: (12/16)
    pres        (hybrid, y, x) float32 ...
    clwmr       (hybrid, y, x) float32 ...
    unknown     (hybrid, y, x) float32 ...
    rwmr        (hybrid, y, x) float32 ...
    snmr        (hybrid, y, x) float32 ...
    grle        (hybrid, y, x) float32 ...
    ...          ...
    t           (hybrid, y, x) float32 ...
    q           (hybrid, y, x) float32 ...
    u           (hybrid, y, x) float32 ...
    v           (hybrid, y, x) float32 ...
    w           (hybrid, y, x) float32 ...
    tke         (hybrid, y, x) float32 ...

but with only 16 of the 20 variables these files are supposed to contain (see full list from NOAA: https://rapidrefresh.noaa.gov/hrrr/HRRRv4_GRIB2_WRFNAT.txt)

Unfortunately one of these missing four happens to by my sought-after smoke.

This issue looks similar (ish) to https://github.com/ecmwf/cfgrib/issues/66 https://github.com/ecmwf/cfgrib/issues/139 https://github.com/ecmwf/cfgrib/issues/45 https://github.com/ecmwf/cfgrib/issues/217

but there are no warning messages thrown and I'm not sure it's an issue with the multi-field message because cfgrib's parsing worked for 16/20 variables rather than reducing down to just one... any ideas would be much appreciated!

iainrussell commented 3 years ago

Hello @jsillin ,

It looks to me like some of your fields are not known by ecCodes (the GRIB engine behind cfgrib). You have a variable called 'unknown' because ecCodes could not identify it, and my first suspicion is that there are five such variables - all will be called 'unknown' and therefore put into the same variable. You can check on the command-line with grib_ls <gribfile> to see what ecCodes understands from the file. You will probably need to obtain or create local ecCodes tables for this data, although it is strange that some variables are understood.

Cheers, Iain

karlwx commented 3 years ago

I've got a similar issue but I am getting an error. I'm working with RAP data.

The error message:

skipping variable: paramId==165 shortName='u10'
Traceback (most recent call last):
  File "/home/meteo/kps5442/.conda/envs/radar/lib/python3.9/site-packages/cfgrib/dataset.py", line 653, in build_dataset_components
    dict_merge(variables, coord_vars)
  File "/home/meteo/kps5442/.conda/envs/radar/lib/python3.9/site-packages/cfgrib/dataset.py", line 584, in dict_merge
    raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='heightAboveGround' value=Variable(dimensions=(), data=2.0) new_value=Variable(dimensions=(), data=10.0)
skipping variable: paramId==166 shortName='v10'
Traceback (most recent call last):
  File "/home/meteo/kps5442/.conda/envs/radar/lib/python3.9/site-packages/cfgrib/dataset.py", line 653, in build_dataset_components
    dict_merge(variables, coord_vars)
  File "/home/meteo/kps5442/.conda/envs/radar/lib/python3.9/site-packages/cfgrib/dataset.py", line 584, in dict_merge
    raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='heightAboveGround' value=Variable(dimensions=(), data=2.0) new_value=Variable(dimensions=(), data=10.0)

Here's the output of running grib_ls <gribfile>, as suggested. Everything looks normal to me:

rap.grib2
edition      centre       date         dataType     gridType     stepRange    typeOfLevel  level        shortName    packingType  
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  250          gh           grid_jpeg   
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  250          t            grid_jpeg   
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  250          r            grid_jpeg   
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  250          u            grid_jpeg   
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  250          v            grid_jpeg   
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  500          gh           grid_jpeg   
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  500          t            grid_jpeg   
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  500          r            grid_jpeg   
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  500          u            grid_jpeg   
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  500          v            grid_jpeg   
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  700          gh           grid_jpeg   
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  700          t            grid_jpeg   
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  700          r            grid_jpeg   
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  700          u            grid_jpeg   
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  700          v            grid_jpeg   
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  850          gh           grid_jpeg   
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  850          t            grid_jpeg   
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  850          r            grid_jpeg   
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  850          u            grid_jpeg   
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  850          v            grid_jpeg   
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  925          gh           grid_jpeg   
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  925          t            grid_jpeg   
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  925          r            grid_jpeg   
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  925          u            grid_jpeg   
2            kwbc         20211104     fc           lambert      0            isobaricInhPa  925          v            grid_jpeg   
2            kwbc         20211104     fc           lambert      0            meanSea      0            mslma        grid_jpeg   
2            kwbc         20211104     fc           lambert      0            heightAboveGround  2            2t           grid_jpeg   
2            kwbc         20211104     fc           lambert      0            heightAboveGround  2            2d           grid_jpeg   
2            kwbc         20211104     fc           lambert      0            heightAboveGround  10           10u          grid_jpeg   
2            kwbc         20211104     fc           lambert      0            heightAboveGround  10           10v          grid_jpeg   
30 of 30 messages in rap.grib2

30 of 30 total messages in 1 files
iainrussell commented 2 years ago

Hi @karlwx, the problem here is that you have different level types (pressure, meanSea and heightAboveGround). These are not compatible in a single xarray dataset (you could easily have the same height level value as a valid pressure level value, e.g. 100, and then the vertical coordinates would get confused). In short, you need to separate your data using the filtering facility described in the readme, e.g.

xr.open_dataset('nam.t00z.awp21100.tm00.grib2', engine='cfgrib',
  backend_kwargs={'filter_by_keys': {'typeOfLevel': 'heightAboveGround'}})

You would need to do this for each typeOfLevel, and therefore end up with one xarray dataset for each. I hope that makes sense! Iain

karlwx commented 2 years ago

In short, you need to separate your data using the filtering facility described in the readme

Hi @iainrussell , Unfortunately, this does not seem to be the only problem here. I've adjusted my code based on your suggestions and still run into the same error.

ds = xr.open_dataset('./data/rap.2021112900.grib2', engine='cfgrib',
                     backend_kwargs={'filter_by_keys': {'typeOfLevel': 'heightAboveGround'}})
skipping variable: paramId==167 shortName='t2m'
Traceback (most recent call last):
  File "/home/meteo/kps5442/.conda/envs/radar/lib/python3.9/site-packages/cfgrib/dataset.py", line 653, in build_dataset_components
    dict_merge(variables, coord_vars)
  File "/home/meteo/kps5442/.conda/envs/radar/lib/python3.9/site-packages/cfgrib/dataset.py", line 584, in dict_merge
    raise DatasetBuildError(
cfgrib.dataset.DatasetBuildError: key present and new value is different: key='heightAboveGround' value=Variable(dimensions=('heightAboveGround',), data=array([1000., 4000.])) new_value=Variable(dimensions=(), data=2.0)

I'm wondering if this is specifically related to the RAP data I'm using (you can download a file to test from here: https://nomads.ncep.noaa.gov/pub/data/nccf/com/rap/prod/

Also, I'm wondering if there is a way to list all the different values of typeOfLevel present in the grib file? This would make it much easier to anticipate any issues and build code to open each as a separate dataset.

madsobdrupjakobsen commented 1 year ago

Did you solve it? I am having the same problem. Loading the definitions show the correct output with grib_ls, but of some reason some of the variables are not correctly interpreted by xarray and thus thrown into 'unknown'..

iainrussell commented 1 year ago

Hi @madsobdrupjakobsen, if you are getting 'unknown' parameters, have a look at #230.

tejasanilshah commented 1 year ago

Hey, I had the same problem and I noticed that in the output of grib_ls the shortName field is 10u. I tried adding that to the filters, and it works. So something like this will do the trick:

grib_ds_10u = xr.open_dataset(grib_filepath, engine="cfgrib", filter_by_keys={'typeOfLevel': 'heightAboveGround', 'shortName': '10u'})
grib_ds_10v = xr.open_dataset(grib_filepath, engine="cfgrib", filter_by_keys={'typeOfLevel': 'heightAboveGround', 'shortName': '10v'})

Just to make things a little more fun, the data variables in the datasets are actually u10, and v10, and not 10u, or 10v.

This is probably the reason why cfgrib throws this error cfgrib.dataset.DatasetBuildError: key present and new value is different: