ecmwf / cfgrib

A Python interface to map GRIB files to the NetCDF Common Data Model following the CF Convention using ecCodes
Apache License 2.0
398 stars 75 forks source link

Issue with NCEP seasonal forecasts #214

Open matteodefelice opened 3 years ago

matteodefelice commented 3 years ago

I think this issue is not really an issue caused by cfgrib but I think it's relevant because in this case it forces me to switch to the NetCDF version of the files on the CDS.

I downloaded an NCEP forecast monthly, start date january and lead times 2, 3 and 4. The dimension are very big, due probably to the initialization method of the NCEP, if I open the GRIB with xarray I get:

<xarray.Dataset>
Dimensions:     (latitude: 180, longitude: 360, number: 28, step: 39, time: 644)
Coordinates:
  * number      (number) int64 0 1 2 3 4 5 6 7 8 ... 19 20 21 22 23 24 25 26 27
  * time        (time) datetime64[ns] 1992-12-02 ... 2015-01-01T00:18:00
  * step        (step) timedelta64[ns] 59 days 60 days ... 150 days 151 days
    surface     int64 ...
  * latitude    (latitude) float64 89.5 88.5 87.5 86.5 ... -87.5 -88.5 -89.5
  * longitude   (longitude) float64 0.5 1.5 2.5 3.5 ... 356.5 357.5 358.5 359.5
    valid_time  (time, step) datetime64[ns] ...
Data variables:
    t2m         (number, time, step, latitude, longitude) float32 ...
Attributes:
    GRIB_edition:            1
    GRIB_centre:             kwbc
    GRIB_centreDescription:  US National Weather Service - NCEP
    GRIB_subCentre:          98
    Conventions:             CF-1.7
    institution:             US National Weather Service - NCEP
    history:                 2021-03-26T16:33:44 GRIB to CDM+CF via cfgrib-0....

I cannot do anything with this file, because I would need 170 GB of memory to deal with it. Instead, if I convert it to NetCDF it becomes:

xarray.Dataset>
Dimensions:    (latitude: 180, longitude: 360, number: 28, time: 69)
Coordinates:
  * longitude  (longitude) float32 0.5 1.5 2.5 3.5 ... 356.5 357.5 358.5 359.5
  * latitude   (latitude) float32 89.5 88.5 87.5 86.5 ... -87.5 -88.5 -89.5
  * number     (number) int32 0 1 2 3 4 5 6 7 8 9 ... 19 20 21 22 23 24 25 26 27
  * time       (time) datetime64[ns] 1993-02-01 1993-03-01 ... 2015-04-01
Data variables:
    t2m        (time, number, latitude, longitude) float32 ...
Attributes:
    Conventions:  CF-1.6
    history:      2021-03-26 15:32:40 GMT by grib_to_netcdf-2.21.0: grib_to_n...

I was using the GRIB because in my workflow I am computing the mean on the 'step' dimension to have a seasonal average, but in this case I get an error due to lack of memory to perform the computation. Why the NetCDF is so compact? What's the magic behind it?

alexamici commented 3 years ago

@matteodefelice monthly datasets are tricky to translate due to fact that "a month" is no a proper time interval (see the discussion here for example: https://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#time-coordinate)

You may try to use the poorly documented time_dims option that was added exactly to handle CDS monthly seasonal products: https://github.com/ecmwf/cfgrib/issues/97#issuecomment-557190695

I didn't test it on NCEP though.

If you want to know more, the whole conversation in the issue is a good read.

matteodefelice commented 3 years ago

Thanks a lot. It seems that if I use verifying_time in time_dims I get the same coordinate used by grib_to_netcdf. This should solve my problem BUT...why can't I use only verifying_time in time_dims? If I use backend_kwargs=dict(time_dims = ('time', 'verifying_time')) everything works fine but if I leave only the second:

ValueError: time_dims 'verifying_time' not a subset of ['time', 'step', 'valid_time', 'verifying_time', 'forecastMonth', 'indexing_time']

The error message doesn't help a lot. What is happening here?

SOLVED: it's the annoying Python-comma...If I use:

 `backend_kwargs=dict(time_dims = ('verifying_time',))

then the check in dataset.py works :)

Do you see any potential issue in doing this?