Open bouweandela opened 4 years ago
good call!! I reckon this is a good first serious feature for v2.1 :beer:
To get an idea of the priority of the preprocessor functions, here is a rough count of the number of recipes in the ESMValTool that they are used in:
regrid 60
extract_region 30
derive 26
extract_levels 24
mask_landsea 21
area_statistics 20
climate_statistics 18
multi_model_statistics 16
mask_fillvalues 15
annual_statistics 8
convert_units 8
weighting_landsea_fraction 7
anomalies 7
extract_point 6
zonal_statistics 5
extract_time 4
extract_shape 4
amplitude 4
extract_season 3
detrend 3
extract_month 2
extract_transect 2
depth_integration 2
volume_statistics 2
mask_landseaice 1
extract_volume 1
extract_trajectory 1
extract_named_regions 1
meridional_statistics 1
daily_statistics 1
decadal_statistics 1
Two points related to this:
Now I'm inclined to do ad-hoc rechunking whenever I need it
I think it makes sense to do that, because in most cases there is no need to rechunk. I would expect this is needed only for preprocessor functions that dramatically increase chunk size, and even then it might be best to try and leave that to iris.
Some things we ran into:
The recipe_collins13ipcc
is one of the most demanding memory recipe that we have in ESMValTool, see log file for a test run for the v2.5 release. Would it be a case that would benefit from preprocessor laziness?
Yes, it would benefit. Some progress has been made recently by @zklaus on the lazy regridding for ocean data (though I think this is currently not yet enabled by default, you need to make changes to the recipe in order to use the new functionality and install an extra package manually). Lazy vertical interpolation would be a good next candidate to tackle.
Overview issue with laziness status of preprocessor functions:
Checked means lazy, unchecked means not lazy or partially lazy, a question mark behind the preprocessor name means that it is unknown whether this preprocessor function is lazy or not.
iris.analysis.trajectory.interpolate
function is not lazyNote that
*_statistics
preprocessor functions are lazy except formedian
, workaround is to useoperator: percentile; percent: 50
.It would be great if we could make more preprocessor functions lazy. The laziness status should also be indicated in the docstrings.
Related to #51