ESMValGroup / ESMValCore

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.
https://www.esmvaltool.org
Apache License 2.0
42 stars 38 forks source link

time values after daily_statistics (or monthly_statistics) #398

Open SarahAlidoost opened 4 years ago

SarahAlidoost commented 4 years ago

The time bounds in daily_statistics (or monthly_statistics) need a correction. For example, regarding hourly ERA5 data, if daily_statistics is applied after CMORization, the daily time values will be 2001, 1, 1, 11, 30, 0. It should be corrected to 2001, 1, 1, 12, 0, 0.

valeriupredoi commented 4 years ago

@SarahAlidoost daily_statistics and monthly_statistics (see funcs ) produce aggregated data using a simple statistic - I don't see anything wrong with them! If the input data has wrong time points, then that's not the fault of the statistical functions. Could you maybe please expand on your issue and tell us what the actual problem is? :beer:

valeriupredoi commented 4 years ago

ie a quick example would be useful, cheers :beer:

SarahAlidoost commented 4 years ago

@valeriupredoi thanks for the comments.

example: the input hourly data has time values as: [..., hour=0, minute=0, second=0] to [..., hour=0, minute=0, second=0], accounting for 24 hours in one day. After applying daily statistics, the output data has time values as: [..., hour=11, minute=30, second=0] which is not correct. Actually, that should be [..., hour=12, minute=0, second=0].

The daily_statistics and monthly_statistics use the function cube.aggregated_by(). It seems that aggregated_by() changes the time bounds. Also I found a similar issue in iris, please see iris issue

valeriupredoi commented 4 years ago

cool! now it makes total sense - cheers! Probably better to raise this straight with the iris folk ie @bjlittle

bouweandela commented 4 years ago

the input hourly data has time values as: [..., hour=0, minute=0, second=0] to [..., hour=0, minute=0, second=0], accounting for 24 hours in one day.

Shoudn't the input data have time values: [..., hour=0, minute=30, second=0] to [..., hour=23, minute=30, second=0], with corresponding bounds?

SarahAlidoost commented 4 years ago

the input hourly data has time values as: [..., hour=0, minute=0, second=0] to [..., hour=0, minute=0, second=0], accounting for 24 hours in one day.

Shoudn't the input data have time values: [..., hour=0, minute=30, second=0] to [..., hour=23, minute=30, second=0], with corresponding bounds?

Sorry for the typo, both ERA5 raw and cmorized data have time values: [..., hour=0, minute=0, second=0] to [..., hour=23, minute=0, second=0], accounting for 24 hours in one day. After applying daily_statistics with any operator, the time values become: [..., hour=11, minute=30, second=0].

bouweandela commented 4 years ago

I think the problem here is that our implementation of daily_statistics, which uses Cube.aggregated_by to aggregate by ['day_of_year', 'year'] and therefore does not take into account the bounds of the time coordinate.

This can be partly avoided by correcting the ERA5 CMORization, because it looks like at least accumulated, mean and min, max variables should have their timesteps shifted half an hour back: https://confluence.ecmwf.int/display/CKB/ERA5+data+documentation#ERA5datadocumentation-Meanratesandaccumulations

However, for instantaneous variables this will be more difficult to solve.