ESMValGroup / ESMValCore

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.
https://www.esmvaltool.org
Apache License 2.0
41 stars 37 forks source link

Make preprocessor lazy #674

Open bouweandela opened 4 years ago

bouweandela commented 4 years ago

Overview issue with laziness status of preprocessor functions:

Checked means lazy, unchecked means not lazy or partially lazy, a question mark behind the preprocessor name means that it is unknown whether this preprocessor function is lazy or not.

Note that *_statistics preprocessor functions are lazy except for median, workaround is to use operator: percentile; percent: 50.

It would be great if we could make more preprocessor functions lazy. The laziness status should also be indicated in the docstrings.

Related to #51

valeriupredoi commented 4 years ago

good call!! I reckon this is a good first serious feature for v2.1 :beer:

bouweandela commented 3 years ago

To get an idea of the priority of the preprocessor functions, here is a rough count of the number of recipes in the ESMValTool that they are used in:

regrid 60
extract_region 30
derive 26
extract_levels 24
mask_landsea 21
area_statistics 20
climate_statistics 18
multi_model_statistics 16
mask_fillvalues 15
annual_statistics 8
convert_units 8
weighting_landsea_fraction 7
anomalies 7
extract_point 6
zonal_statistics 5
extract_time 4
extract_shape 4
amplitude 4
extract_season 3
detrend 3
extract_month 2
extract_transect 2
depth_integration 2
volume_statistics 2
mask_landseaice 1
extract_volume 1
extract_trajectory 1
extract_named_regions 1
meridional_statistics 1
daily_statistics 1
decadal_statistics 1
Peter9192 commented 3 years ago

Two points related to this:

bouweandela commented 3 years ago

Now I'm inclined to do ad-hoc rechunking whenever I need it

I think it makes sense to do that, because in most cases there is no need to rechunk. I would expect this is needed only for preprocessor functions that dramatically increase chunk size, and even then it might be best to try and leave that to iris.

Peter9192 commented 3 years ago

Some things we ran into:

remi-kazeroni commented 2 years ago

The recipe_collins13ipcc is one of the most demanding memory recipe that we have in ESMValTool, see log file for a test run for the v2.5 release. Would it be a case that would benefit from preprocessor laziness?

bouweandela commented 2 years ago

Yes, it would benefit. Some progress has been made recently by @zklaus on the lazy regridding for ocean data (though I think this is currently not yet enabled by default, you need to make changes to the recipe in order to use the new functionality and install an extra package manually). Lazy vertical interpolation would be a good next candidate to tackle.