Expansion of the `resume_from` function

malininae commented 2 days ago

At the November 2024 workshop the topic of expansion of the resume_from function came up.

Here are my propositions:

Make the requirement of the recipes being absolutely identical looser. For example, allow resuming if few keywords in diagnostic changed. Ideally, addition of a new diagnostic and/or of variable group would be allowed to restart from.
If the procession of the variable group didn't finish, still use the processed files. There were a few instances when a part of the variable group got processed, but the whole group had to restart, even though let's say 169 out of 170 files got successfully processed.

Happy to elaborate, other suggestions welcome, I know @k-a-webb had some suggestions.

k-a-webb commented 2 days ago

My use case mainly concerns model benchmarking, where I am re-running the same recipe except an edit to the datasets list. A very convenient feature of the current resume_from tool is that it uses the already preprocessed data. Currently, resumable recipes need be exactly (or nearly exactly) the same as run previously -- which excludes editing the datasets list.

I would propose relaxing the requirement of having an identical dataset list, which is to say I support @malininae 's proposals!

bouweandela commented 1 day ago

Editing the dataset list will only be possible if no preprocessor functions are used that use all datasets as input, i.e.: https://github.com/ESMValGroup/ESMValCore/blob/9b9a12526d9afdc87a5dd9e6904efe37acb629ac/esmvalcore/preprocessor/__init__.py#L222-L229

ESMValGroup / ESMValCore

Expansion of the `resume_from` function #2582