Open SarahAlidoost opened 4 years ago
@SarahAlidoost daily_statistics
and monthly_statistics
(see funcs ) produce aggregated data using a simple statistic - I don't see anything wrong with them! If the input data has wrong time points, then that's not the fault of the statistical functions. Could you maybe please expand on your issue and tell us what the actual problem is? :beer:
ie a quick example would be useful, cheers :beer:
@valeriupredoi thanks for the comments.
example: the input hourly data has time values as: [..., hour=0, minute=0, second=0] to [..., hour=0, minute=0, second=0], accounting for 24 hours in one day. After applying daily statistics, the output data has time values as: [..., hour=11, minute=30, second=0] which is not correct. Actually, that should be [..., hour=12, minute=0, second=0].
The daily_statistics and monthly_statistics use the function cube.aggregated_by(). It seems that aggregated_by() changes the time bounds. Also I found a similar issue in iris, please see iris issue
cool! now it makes total sense - cheers! Probably better to raise this straight with the iris
folk ie @bjlittle
the input hourly data has time values as: [..., hour=0, minute=0, second=0] to [..., hour=0, minute=0, second=0], accounting for 24 hours in one day.
Shoudn't the input data have time values: [..., hour=0, minute=30, second=0] to [..., hour=23, minute=30, second=0], with corresponding bounds?
the input hourly data has time values as: [..., hour=0, minute=0, second=0] to [..., hour=0, minute=0, second=0], accounting for 24 hours in one day.
Shoudn't the input data have time values: [..., hour=0, minute=30, second=0] to [..., hour=23, minute=30, second=0], with corresponding bounds?
Sorry for the typo, both ERA5 raw and cmorized data have time values: [..., hour=0, minute=0, second=0] to [..., hour=23, minute=0, second=0], accounting for 24 hours in one day. After applying daily_statistics with any operator, the time values become: [..., hour=11, minute=30, second=0].
I think the problem here is that our implementation of daily_statistics, which uses Cube.aggregated_by
to aggregate by ['day_of_year', 'year']
and therefore does not take into account the bounds of the time coordinate.
This can be partly avoided by correcting the ERA5 CMORization, because it looks like at least accumulated, mean and min, max variables should have their timesteps shifted half an hour back: https://confluence.ecmwf.int/display/CKB/ERA5+data+documentation#ERA5datadocumentation-Meanratesandaccumulations
However, for instantaneous variables this will be more difficult to solve.
The time bounds in daily_statistics (or monthly_statistics) need a correction. For example, regarding hourly ERA5 data, if daily_statistics is applied after CMORization, the daily time values will be 2001, 1, 1, 11, 30, 0. It should be corrected to 2001, 1, 1, 12, 0, 0.