Open larsbarring opened 1 year ago
Hi @larsbarring many thanks for proposing this! We have a preprocessor called regrid_time that aligns time points from cubes with differing time axes on a common standardized time axis, and that should account for differing calendars too (I believe I made sure that calendars were taken care of via conversion from num to date via a standard calendar, when I wrote that), have you seen/tried it? Of course, we can always generalize it or add a new eg regrid/align_calendars
preprocessor. About fill values, we need to stick to the CMOR standard _fillValue
that is specified in the file's metadata (1e20), no mixing of NaNs or 1e999 or other such things. About interpolating data from one day to another - that shouldn't be tricky - because for monthly means we don't need do that, for daily means we can just do a mean of the two days straddling the missing day data point, not sure about hourly data though. @ESMValGroup/technical-lead-development-team what you folks think?
HI @valeriupredoi, many thanks for quick the response! No, I did not know of regrid_time
(I a newcomer to the world of ESMVal), but i have been discussing the general requirement from our side with @zklaus and @ljoakim (any insights?). I had quick look at the docu link you provide, but that does not give enough detail (for me ...). I guess some quick testing might fill that gap. Anyways, and as a general comment, I think that it would be very useful to align what is done in ESMVal with what is done in xarray (or vice versa!! --- I am all for common methods :-) ). And yes, as you write this is a feature request that do focus on daily data, at higher temporal resolution something else is needed, and lower resolution (monthly...) things become easier.
And, yes, I do take your point about _fillValue
.
Our regrid_time
does not change the calendar, so is not addressing this issue.
I am not sure I understand the comment on fill value. @larsbarring suggested that the fill value could possibly be used as one marker value. That would be possible with 1e20 as with any other; regardless, that applies to masking, I think. It still is perfectly reasonable to use nans for cases where nans make sense, no?
I think a new change_calendar
or ensure_calendar
preprocessor would be useful. The main challenge to me seems to be the efficient allocation and memory handling of long timeseries with a few extra days sprinkled in. The filling method should probably be configurable.
ah then - very good and informative comments from both you gents @larsbarring and @zklaus :+1: I am annoyed with myself I didn't fix the calendar business in regrid_time TBH but then again, that function doesn't really help much when it comes to frequent data (daily etc). Then it does indeed sound like a good idea for a preprocessor! About xarray - good idea, we are directly involved with iris, so prob best we go via iris first (especially since I believe there still is an effort to merge forces, iris and xarray, I mean). Cheers @zklaus - I think I overthinked the missing/fill value issue :grin:
This may also be relevant as an option when adding days to a 360_day
calendar: https://loca.ucsd.edu/loca-calendar/. To convert to standard, they always insert Feb 29th for leap years, and 5 additional days are inserted randomly in their corresponding 72 day time block, in order to reduce some statistical effects that may arise from adding the same days every year. Also implemented in xclim (https://github.com/Ouranosinc/xclim/issues/841).
We just encountered problems (again) with regrid_time
in https://github.com/ESMValGroup/ESMValCore/pull/2299. I really like this approach here, but I fear that it will take some time to properly implement this.
In the meantime, I propose to implement a "workaround" for monthly and yearly data. For those, it should be sufficient to simply take the 15th of the month (for monthly data) or 1 July (for yearly data) and assign those dates to the data using a fixed calendar (most likely standard
). The errors from this should be minimal.
This would be very simple to implement. Moreover, we are already doing exactly the same for multi_model_statistics
:
If we agree on this, I can try to implement that.
How should this be implemented? I can think of two solutions at the moment:
calendar
keyword to regrid_time
. If not set (default), use the current behavior, if set, use as a common calendar. This is fully backwards-compatible and easy to implement; however, it enforces calendar changes (e.g., if all input data is already on the same calendar, then this will also be changed).align_calendar
(or align_time
or any other name). This is also fully backwards-compatible and can keep identical calendars if possible (just like the code for multi-model statistics does); however, it's more difficult to implement. Moreover, we then end up with two very similar preprocessors regrid_time
and the new one, which might be confusing (and I am not sure if regrid_time
in its current form is useful at all).I think 1. is the better solution, and we could even think of changing the default behavior in the future (with a proper deprecation cycle). @ESMValGroup/technical-lead-development-team any opinions?
In relation to what @schlunma wrote:
In the meantime, I propose to implement a "workaround" for monthly and yearly data. For those, it should be sufficient to simply take the 15th of the month (for monthly data) or 1 July (for yearly data) and assign those dates to the data using a fixed calendar (most likely standard). The errors from this should be minimal.
I would like to just state the obvious difference between an intensive and an extensive quantity. For the former, simply adjusting the time coordinate should be fine, but for the latter differences in period length between different dataset calendars should be factored in by adjusting the data as such, and not only the time coordinate.
Any improvements to regrid_time
are welcome of course. Maybe we should consider the suggestion by @zklaus to rename it to change_calendar
if that makes it easier to find. I agree with the suggestion from the top post to use xarray.Dataset.convert_calendar
(and also xarray.Dataset.interp_calendar
) as much as possible and see if we can contribute additional features, as described in the table above, back to those functions. I couldn't find any feature in Iris that supports this type of time-specific operations, but we could open an issue there as well to ask if there is an interest.
Maybe we should consider the suggestion by @zklaus to rename it to
change_calendar
if that makes it easier to find. I agree with the suggestion from the top post to usexarray.Dataset.convert_calendar
(and alsoxarray.Dataset.interp_calendar
) as much as possible and see if we can contribute additional features, as described in the table above, back to those functions.
In principle, I agree with that. However, I fear that it will take some time to implement this properly for all calendar combinations and frequencies (in addition, there is still no perfect solution for bridging iris and xarray; see https://github.com/SciTools/iris/issues/4994), and I am on a deadline here (I need this for our EGU abstract).
Thus, I would propose to expand the existing regrid_time
so that it's able to convert calendars for monthly and yearly data. As mentioned, this is mostly straightforward to implement and we are already doing the exact same in our multi-model statistics code. Of course, we should mention the caveats there (e.g., extensive vs. intensive variables). In the future, we can add a new preprocessor change_calendar
that generalizes this for all frequencies and (maybe) deprecate regrid_time
. What do you think about this?
Improving regrid_time a bit sounds fine to me. We can deprecate it as soon as someone has time to actually work on this issue and implement things properly.
there is still no perfect solution for bridging iris and xarray
This may not be a huge problem in this case, it should be possible to just convert the time coordinate and time dependent data/coordinates etc separately to an Xarray Dataset and use the resulting values to make a new cube.
A PR with improvements to regrid_time
and the proposed addition of a calendar
argument (that only works for decadal, yearly, and monthly data) is open here: https://github.com/ESMValGroup/ESMValCore/pull/2311
Is your feature request related to a problem? Please describe. There are many use-cases when model data and observational datasets are combined for some analyses. When the datasets have daily resolution non-
standard
model calendars cause problems. In addition, for certain analyses (e.g. of climate indicators related to spell length or day-of-year when something happens) the leap day of thestandard
calendar is a complication. Hence a preprocessor to convert calendars would be very useful, and allow for a common approach within a wide community for solving a problem that otherwise, and traditionally, "everyone" is solving in some ad hoc way by a quick-and-dirty fix (in the worst case again and again).Hence, we propose the following conversion table (numbers are explained below)
360_day
365_day
standard
gregorian
proleptic_ gregorian
366_day
julian
none
360_day
365_day
standard
gregorian
proleptic_ gregorian
366_day
julian
none
standard
gregorian
calendar is deprecated() Fill-in: For transformation from
365_day
calendar tostandard
orproleptic_gregorian
calendar it is suggested to add a day after February 28th (day-in-year 59). For transformation from* the360_day
calendar several days have to be added:For conversion to a non-leap year the following days should be inserted (day-in-year in parenthesis):
For conversion to a leap year the following days should be inserted (day-in-year in parenthesis):
This follows what has been implemented in xarray (xarray.Dataset.convert_calendar using
align_to = "year"
) . However, as is indicated in the table above, we suggest not to implement transformations to the360_day
calendar, or the xarray alternativealign_to = "date"
because it removes several days.We also suggest the following alternatives for "creating" fill-in data:
NaN
, or similar like_FillValue
Would you be able to help out? Would you have the time and skills to implement the solution yourself?