seasonal forecast valid-time inconsistency

EddyCMWF commented 5 years ago

Hi,

We have been looking at the xarrays and netCDF files produced by cfgrib for the seasonal forecast data. The valid_time recovered using cfgrib is not be exactly consistent with the time stamp recovered by the eccodes grib_to_netcdf method and it may be the case that users are misinterpreting the output they receive.

Using cfgrib, the value of valid_time is the time at the end of the time aggregation window. I think that cfgrib must calculate this as the sum of the time and step. Using ecCodes, the time outputs is at the start of the aggregation window, which for step index n would be the sum of time and step(n-1).

I am bringing this to you attention so that you can decide the best way to deal with it, but it would be very useful (CDS-toolbox) for us to have access to the start of the time aggregation window in the xarray that is recovered. Otherwise we will have to build some sort of work around in the toolbox. One option could be to output the time bounds of the aggregation period, with an additional dimension of length 2.

Additionally, I think that it is probably necessary to include something in the metadata that the valid_time represents the time at the end of the aggregation window as this may not be what users were expecting.

Please let me know if you need any more information about what I am trying to explain. As an example case use the grib file retrieved with the API request below. The time stamps are different when converted using the grib_to_netcdf and cfgrib approaches.

import cdsapi

c = cdsapi.Client()

c.retrieve(
    'seasonal-monthly-single-levels',
    {
        'originating_centre':'ecmwf',
        'system':'5',
        'variable':'2m_dewpoint_temperature',
        'product_type':'monthly_mean',
        'year':'2017',
        'month':[
            '01','02'
        ],
        'leadtime_month':[
            '1','2'
        ],
        'format':'grib'
    },
    'download.grib')

alexamici commented 5 years ago

@EddyCMWF thanks for reporting, this is indeed a rather non trivial bug, or rather a number of bugs.

cfgrib in fact uses endStep, which I agree is not ideal in the general case and it has no option to switch to startStep
the CDS monthly SEAS5 have the startStep key equal to the endStep one so the GRIB files appear to be inconsistent

$ grib_dump -a tests/sample-data/cds-seasonal-monthly-single-levels.grib
#==============   MESSAGE 1 ( length=14776 )               ==============
GRIB {
  # ALIASES: ls.edition
  editionNumber = 1;
  ...
  # Hour (stepUnits.table)  
  stepUnits = 1;
  # ALIASES: time.stepRange
  stepRange = 720;
  startStep = 720;
  # ALIASES: stepInHours
  endStep = 720;
  ...

edupenabad commented 5 years ago

Hi @alexamici

The problem here is the limitation of the GRIB1 format to handle this kind of aggregations for long forecasts.

As it is described in the GRIB1 section 1, we have octets 18 to 21 to describe that:

With those ingredients, using P1 as the startStep and P2 as the endStep, it would not be possible to properly encode aggregations in forecasts longer than 255 unitOfTimeRange and there are also some other not possible encodings when dataTime is not 0000

The way this was dealt with in these cases was to use timeRangeIndicator value equals to 10, which means (GRIB1 Code Table 5) "P1 occupies octets 19 and 20; product valid at reference time + P1" (that's why stepRange=startStep=endStep), which it doesn't clearly state whether this is an instantaneous time range (e.g. timeRangeIndicator=0) or an aggregated one (e.g. timeRangeIndicator=3 or timeRangeIndicator=4)

Additionally, in those GRIB messages coming from seasonal monthly means, it is present localDefinitionNumber=16 and it is within the scope of those local definitions where they are available relevant keywords containing essential information to properly decode the messages (e.g. forecastMonth or verifyingMonth)

I hope that information is helpful and contributes to the discussion.

alexamici commented 4 years ago

@EddyCMWF and @matteodefelice even if the solution is still far from ideal, new version 0.9.7.3 of cfgrib has an option to read the ecCodes reported valid_time and to select the time coordinates to use as array dimensions.

ECMWF SEAS5 datasets can now be opened correctly as follows:

>>> import xarray as xr
>>> ds = xr.open_dataset('ecmwf-seas5.grib', engine='cfgrib', backend_kwargs={'time_dims': ('time', 'valid_time')})
>>> ds
<xarray.Dataset>
Dimensions:     (latitude: 181, longitude: 360, time: 2, valid_time: 7)
Coordinates:
    number      int64 ...
  * time        (time) datetime64[ns] 2019-09-01 2019-10-01
  * valid_time  (valid_time) datetime64[ns] 2019-10-01 2019-11-01 ... 2020-04-01
    surface     int64 ...
  * latitude    (latitude) float64 90.0 89.0 88.0 87.0 ... -88.0 -89.0 -90.0
  * longitude   (longitude) float64 0.0 1.0 2.0 3.0 ... 356.0 357.0 358.0 359.0
Data variables:
    t2m         (time, valid_time, latitude, longitude) float32 ...
Attributes:
    GRIB_edition:            1
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
    history:                 2019-11-04T18:48:09 GRIB to CDM+CF via cfgrib-0....

edupenabad commented 4 years ago

Hi @alexamici !

Can it be that the additional feature in 0.9.7.3 (i.e. specifying what dimensions use for the time coordinates) just solves the issue reported by @matteodefelice in #38 but not what was reported in this issue #97?

It seems to me that using just time and valid_time as coordinates solves the problem of having weird values for "step" (due to the different length of the months) and produces a nicer hypercube based on those two coordinates (time and valid_time).

But if you look at the values in your example (I have reproduced the same with 0.9.7.3 and a different seasonal forecast monthly means GRIB file), we still have the issue of wrong validity times being equal to time+step, which was the issue reported by @EddyCMWF here.

edupenabad commented 4 years ago

I am not really familiar at all with the code of grib_to_netcdf but, as @EddyCMWF mentioned, it is able to deal with this issue properly, so I wonder to what extent the rationales used there could be useful as an inspiration on how to implement that into cfgrib

alexamici commented 4 years ago

@edupenabad I'm a bit puzzled, when setting time_dims=('time', 'valid_time') cfgrib uses ecCodes keys validityDate and validityTime for valid_time (after your suggestion), not time + step.

BTW, when referencing monthly it is a convention where you put the valid_time, ath the beginning of the period, in the middle or at the end. I like the beginning and that is what is encoded in the SEAS5 GRIB as read by ecCodes.

edupenabad commented 4 years ago

@alexamici I can not more than sympathise with your feeling of confusion as this really highlights how important it is to get this done properly to avoid that even highly skilled users with a more than decent familiarity with the files interpret them wrongly.

I will express a bit more bluntly what I said in my previous comment. It is not that I doubt about it, it is that I am convinced that issue #38 was solved with the feature added in 0.9.7.3 while issue #97 remains unsolved.

I'll try to elaborate on the problem with a couple of additional examples that I hope will showcase more concisely its implications.

If we explore some of the keywords related to time coordinates in a SEAS5 monthly means file:

$ grib_ls -p date,time,validityDate,validityTime,forecastMonth,verifyingMonth seas5_monthlymeans.grib 
seas5_monthlymeans.grib
date            time            validityDate    validityTime    forecastMonth   verifyingMonth  
20191101        0               20191201        0               1               201911         
20191101        0               20200101        0               2               201912         
20191101        0               20200201        0               3               202001         
20191101        0               20200301        0               4               202002         
20191101        0               20200401        0               5               202003         
20191101        0               20200501        0               6               202004         
6 of 6 messages in seas5_monthlymeans.grib

6 of 6 total messages in 1 files

Note the discrepancy between verifyingMonth (the keyword in local definitions table 16 I mentioned in my first comment) and validityDate/validityTime (the keywords used by cfgrib to populate valid_time coordinate, which are not keywords physically stored in the GRIB header but computed keywords). In a more-than-educated guess I am inclined to say the latter are computed as the sum of date/time + step

If we use grib_to_netcdf to convert that same file to netcdf:

$ grib_to_netcdf -o seas5_monthlymeans.nc seas5_monthlymeans.grib 
grib_to_netcdf: Version 2.14.1
grib_to_netcdf: Processing input file 'seas5_monthlymeans.grib'.
grib_to_netcdf: Found 6 GRIB fields in 1 file.
grib_to_netcdf: Ignoring key(s): method, type, stream, refdate, hdate
grib_to_netcdf: Creating netCDF file 'seas5_monthlymeans.nc'
grib_to_netcdf: NetCDF library version: 4.6.3 of May  8 2019 00:09:03 $
grib_to_netcdf: Creating large (64 bit) file format.
grib_to_netcdf: Defining variable 't2m'.
grib_to_netcdf: Done.

$ python -c "import netCDF4 as nc; ds=nc.Dataset('seas5_monthlymeans.nc','r'); aa=nc.num2date(ds.variables['time'][:],ds.variables['time'].units); print (aa)"
[datetime.datetime(2019, 11, 1, 0, 0) datetime.datetime(2019, 12, 1, 0, 0)
 datetime.datetime(2020, 1, 1, 0, 0) datetime.datetime(2020, 2, 1, 0, 0)
 datetime.datetime(2020, 3, 1, 0, 0) datetime.datetime(2020, 4, 1, 0, 0)]

As you can see, grib_to_netcdf is able to understand that the GRIB file has been written using local definitions (specifically table 16), and therefore it populates the time coordinate with the right values, i.e. Nov/2019 is the sensible value (label) for forecastMonth=1 in a monthly means file with nominal start date 20191101 and not Dec/2019 as it might be interpreted from the computed values validityDate/validityTime. These last sentences are not a guess, an interpretation or an expression of a personal preference, it is an explanation on how seasonal forecast monthly means GRIB files have been consistently encoded for a long time at ECMWF.

Finally, regarding your thinking about where to point within an interval as its label, as you know, the encoding of an aggregation across a coordinate is solved in CF with a proper use of 'Cell Boundaries', which unequivocally defines the interval of aggregation. Having that complete definition of the interval, the election of a given value within the interval as its label, it is in my opinion, more a question of uses cases than of personal preferences, and we can easily find reasonable use cases for the beginning, the middle or the end of the interval.

I hope those points contribute to disentangle the confusion and help to find the more suitable solution to this issue.

alexamici commented 4 years ago

@edupenabad thank you for your understanding your additional clarification, in fact I confused verifyingMonth with validityDate/validityTime :/ Sorry!

Let me recap the way I can recognise how SEAS5 encodes the valid_time variable: if timeRangeIndicator=10 and localDefinitionNumber =16 the correct valid time is given by verifyingMonth.

I'll be looking for a generic way to make such a procedure possible because I don't want to add GRIB specific logic into cfgrib as that belong to ecCodes.

edupenabad commented 4 years ago

Thanks @alexamici ! I'm glad we are now on the same page, I was starting to get seriously concerned about my communication skills :)

I would say that timeRangeIndicator=10 is not a fundamental element for this. I just used it in my initial comment to argue that startStep=endStep is not necessarily a sign of an inconsistent GRIB message.

In my opinion, the key element is the use of localDefinitionNumber=16 which immediately implies the GRIB message is a seasonal forecast monthly aggregation, and therefore forecastMonth and verifyingMonth are present.

alexamici commented 4 years ago

@edupenabad in master you can now add verifying_time to time_dims in the place of valid_time that reads and convert the attribute you mentioned.

>>> xr.open_dataset('ecmwf-seas5.grib', engine='cfgrib', backend_kwargs=dict(time_dims=('time', 'verifying_time')))
<xarray.Dataset>
Dimensions:         (latitude: 181, longitude: 360, time: 2, verifying_time: 7)
Coordinates:
    number          int64 ...
  * time            (time) datetime64[ns] 2019-09-01 2019-10-01
  * verifying_time  (verifying_time) datetime64[ns] 2019-09-01 ... 2020-03-01
    surface         int64 ...
  * latitude        (latitude) float64 90.0 89.0 88.0 87.0 ... -88.0 -89.0 -90.0
  * longitude       (longitude) float64 0.0 1.0 2.0 3.0 ... 357.0 358.0 359.0
    valid_time      (time) datetime64[ns] ...
Data variables:
    t2m             (time, verifying_time, latitude, longitude) float32 ...
Attributes:
    GRIB_edition:            1
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts
    history:                 2019-11-21T18:28:36 GRIB to CDM+CF via cfgrib-0....

If you can confirm that this is what you would expect, I'm quite happy with the API and I'll make a release shortly.

edupenabad commented 4 years ago

Hi @alexamici

Thanks for this! It looks like verifying_time has now the right timestamps that users would expect to have, but I still have a couple of comments/questions:

Is it ok that there is still present a coordinate called valid_time with dimension time (i.e. number of start dates)? And what is its meaning?
I found it confusing to have something called valid_time and something called verifying_time, so I am wondering whether there is a use case for having both of them. Additionally, is there a use case for having the ability to force the hypercube to be formed using the "old" valid_time which is a computed key from date/time and step?

alexamici commented 4 years ago

@edupenabad great! Now to the questions:

since verifying_time has already standard_name=time it is in fact confusing to also have valid_time. I think I can remove it cleanly.
valid_time is useful in most GRIB files. I don't want to add any dataset specific logic in cfgrib, I'm happy with doing the right thing when the user requests it, but I don't want to add logic to disable the ability to index by valid_time in specific cases.

Anyway I'm preparing the next release with support of SEAS5 data.

alexamici commented 4 years ago

I added support to have forecastMonth as dimension in time_dims. And I think with that I can close the issue.

Release coming in a few minutes.

Tinkaa commented 2 years ago

Hi! I was almost trusting the valid_time parameter to actually represent the forecasted month till I was double-checking and found this issue. It is now working like a charm using verifying_time and forecastMonth, so thanks for that! I was wondering if the what and why of this solution is documented somewhere? Then I can avoid the mistake in the future and share it with others. Thanks!

ecmwf / cfgrib

seasonal forecast valid-time inconsistency #97