ESMValGroup / ESMValCore

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.
https://www.esmvaltool.org
Apache License 2.0
42 stars 38 forks source link

Move temporal 'regridding' from multi_model_statistics preprocessor to regrid_time preprocessor #744

Open bouweandela opened 4 years ago

bouweandela commented 4 years ago

At the moment, the preprocessor multi_model_statistics will also attempt to automatically 'regrid' the time coordinate of the incoming datasets. Other coordinates, such as horizontal and vertical coordinates, need to be explicitly regridded by defining a preprocesor in the recipe. It would be good to be consistent and make the temporal regridding more visible and move the temporal regridding to the regrid_time preprocessor. If the datasets have a time coordinate, users would then need to specify the regrid_time preprocessor in the recipe as well as as the regrid and extract_levels.

This would also fix the problem described in #738, that multi_model_statistics does not work for variables that do not have a time coordinate.

Peter9192 commented 4 years ago

It would be really useful if regrid_time could then also check the calendar (as does mmstats, also see #685). I keep running into incompatible time coordinates because the calendars differ.

schlunma commented 4 years ago

It would be really useful if regrid_time could then also check the calendar (as does mmstats, also see #685). I keep running into incompatible time coordinates because the calendars differ.

Yes please!

I (and @hb326) encountered problems with iris.quickplot.plot when the time coordinates of the different cubes have different calendars. As a fix we use the following code:

def _unify_time_coord(cube):
    """Unify time coordinate of cube."""
    if not cube.coords('time', dim_coords=True):
        return
    time_coord = cube.coord('time')
    dates_points = time_coord.units.num2date(time_coord.points)
    dates_bounds = time_coord.units.num2date(time_coord.bounds)
    new_units = Unit('days since 1850-01-01 00:00:00')
    new_time_coord = iris.coords.DimCoord(
        new_units.date2num(dates_points),
        bounds=new_units.date2num(dates_bounds),
        var_name='time',
        standard_name='time',
        long_name='time',
        units=new_units,
    )
    coord_dims = cube.coord_dims('time')
    cube.remove_coord('time')
    cube.add_dim_coord(new_time_coord, coord_dims)

It think it would make sense to include this in regrid_time.

zklaus commented 3 years ago

I support the idea, but careful with the suggested fix above. That may be appropriate in a visualization script where you know the input, but I see two problems with it:

bouweandela commented 3 years ago

It seems to ignore the calendar on the input data. If the models differ in the calendar, say you have one "normal" Gregorian calendar, one calendar without leap years, and one 360 day calendar, it's not clear that the resulting time coordinate is correct.

The idea would be to only do this for time frequencies where it does not matter, e.g. for monthly averages or lower frequencies it does not matter whether or not your calendar includes leap days, right?

zklaus commented 3 years ago

So you are saying this would be only applicable if both source and target frequency are monthly, seasonal, annual? There may still be a (small) problem of weights, but also this could be quite annoying since going from daily to monthly is probably an important use case.

bouweandela commented 3 years ago

So you are saying this would be only applicable if both source and target frequency are monthly, seasonal, annual?

I think so. Or maybe it would still work for daily data if you use some sort of interpolation scheme? I think there is just no way you can meaningfully regrid data with a frequency lower than daily from a 360 day calendar to a 365 day calendar.

going from daily to monthly is probably an important use case.

At the moment we have the preprocessor function monthly_statistics for that. Maybe that could somehow be integrated with regrid_time, but you'd have to supply the kind of statistic (min, max, mean) as a function argument to regrid_time in the recipe because it depends on the variable what you need.

schlunma commented 2 years ago

Moving this to v2.6 since there is not open PR yet.

sloosvel commented 2 years ago

Any chance this can make it for 2.6? Or should it be moved?