ESMValGroup / ESMValTool

ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP
https://www.esmvaltool.org
Apache License 2.0
218 stars 127 forks source link

Recipe testing and output comparison for release 2.8.0 - Final Core release candidate rc2 #3127

Closed remi-kazeroni closed 1 year ago

remi-kazeroni commented 1 year ago

This issue documents the round of recipe testing performed using the Core release candidate v2.8.0rc2.

Release process

System and settings

conda/mamba

(base) mamba --version
mamba 1.3.1
conda 23.1.0

Git branches and state

Tue 21 Mar 13:02:47 CET 2023

(base) :~/ESMValTool 
$ git status
On branch main
Your branch is up to date with 'origin/main'.

nothing to commit, working tree clean

(base) :~/ESMValCore 
$ git status
On branch v2.8.x
Your branch is up to date with 'origin/v2.8.x'.

nothing to commit, working tree clean

Installation and environment

$ cd ~/ESMValTool
$ mamba env create -n tool_280rc2 -f environment.yml
$ conda activate tool_280rc2
$ pip install --editable '.[develop]'
$ cd ~/ESMValCore
$ pip install --editable '.[develop]'

Config user file

Main options: all default except search_esgf: when_missing

```yaml output_dir: ./esmvaltool_output max_parallel_tasks: 8 log_level: debug exit_on_warning: false output_file_type: png remove_preproc_dir: true compress_netcdf: false save_intermediary_cubes: false config_developer_file: null profile_diagnostic: false # Site-specific entries: DKRZ-Levante search_esgf: when_missing download_dir: /work/bd0854/DATA/ESMValTool2/download auxiliary_data_dir: /work/bd0854/DATA/ESMValTool2/AUX rootpath: CMIP6: /work/bd0854/DATA/ESMValTool2/CMIP6_DKRZ CMIP5: /work/bd0854/DATA/ESMValTool2/CMIP5_DKRZ CMIP3: /work/bd0854/DATA/ESMValTool2/CMIP3 CORDEX: /work/ik1017/C3SCORDEX/data/c3s-cordex/output OBS: /work/bd0854/DATA/ESMValTool2/OBS OBS6: /work/bd0854/DATA/ESMValTool2/OBS obs4MIPs: /work/bd0854/DATA/ESMValTool2/OBS ana4mips: /work/bd0854/DATA/ESMValTool2/OBS native6: /work/bd0854/DATA/ESMValTool2/RAWOBS RAWOBS: /work/bd0854/DATA/ESMValTool2/RAWOBS drs: CMIP6: DKRZ CMIP5: DKRZ CMIP3: DKRZ CORDEX: BADC obs4MIPs: default ana4mips: default OBS: default OBS6: default native6: default ```

ESMValTool version

$ esmvaltool version
ESMValCore: 2.8.0rc2
ESMValTool: 2.8.0.dev111+g6faf263f6

Environment file

tool_280rc2.txt

Compute resources used

I used the newly added generate.py script. I made some modifications to it to enable the release manager to run all 150 recipes in one go, by doing python generate.py and adjusted SLURM settings for all "complicated" recipes. I will open a PR shortly to provide more details on that.

On DKRZ-Levante

Note: this is the second and final round of testing for v2.8.0. I will publish the overview website and output of the comparison tool in this issue very soon. And then I will tag the community to check the output. Stay tuned!

remi-kazeroni commented 1 year ago

Overview of the results

Numbers of successes and failures

The first round of recipe testing produced:

Recipe failures

Recipe Problem Related issue PR
recipe_autoassess_landsurface_soilmoisture known missing climatology files (non-public) marked as broken in https://github.com/ESMValGroup/ESMValTool/issues/3103
recipe_check_obs known derivation issue for ERA5 https://github.com/ESMValGroup/ESMValCore/issues/1388

For comparison, we released ESMValTool 2.7.0 with 4 non-working recipes (this could have been 5 if we used a stricter policy on missing data as done for this round of testing)

Overview webpage and path to data

Note: I will soon make a new post with a markdown list so that contributors can tick boxes after checking the output of their favourite recipes. After that, I'll tag the community.

And thanks very much to everyone who helped testing, fixing, maintaining recipes in the previous round of testing! It is very enjoyable to get results like this with v2.8.0rc2.

remi-kazeroni commented 1 year ago

Hi @ESMValGroup/esmvaltool-developmentteam and @ESMValGroup/esmvaltool-recipe-maintainers, the results from the second and last round of recipe testing for the release of ESMValTool and ESMValCore v2.8 are now available. I would be very grateful if you could take a look at the output of your favourite recipes (see list below) and tick the boxes if the output look good to you. If that is not the case, please report the issue by editing the list below or posting in this issue.

Deadline: Tuesday, March 28, noon (GMT) Release of ESMValTool v2.8 is scheduled for that day.

Some guidelines on how to inspect runs:

Output comparison between recipes run with Core v2.8.0rc2 and the previous stable released version v2.7.0

Below is the list of 150 recipes currently available in the main branch. The comparison tool returns:

Action required: 120 out of 147 recipe runs need to be inspected by a human.

See complete output in: compare_v280_output.txt

List of recipes to be checked:

bouweandela commented 1 year ago

Note that the tool can now find many more files providing supplementary variables (ancillary variables and cell measures), provided that fx_variables is not used in the recipe. This means that calculations done by the preprocessor functions area_statistics, mask_landsea, mask_landseaice, volume_statistics, and weighting_landsea_fraction are more accurate. Numerical differences with previous versions are therefore expected. See Supplementary variables (ancillary variables and cell measures) in the preprocessor documentation for more information.

katjaweigel commented 1 year ago
valeriupredoi commented 1 year ago

very many thanks @katjaweigel :beer: Anything you'd reckon can't be fixed with a short (in time) PR?

katjaweigel commented 1 year ago

@valeriupredoi I think the frame for recipe_cmug_h2o and I hope the canvas for recipe_deangelis15nat, but for the second I have to find out how change it, first. (Both should be fixed in the ESMValTool diagnostics.)

valeriupredoi commented 1 year ago

godspeed with that @katjaweigel :racehorse:

katjaweigel commented 1 year ago

Unfortunately I cannot reproduce the issue with the figures from recipe_deangelis15nat: Figure from test run: https://esmvaltool.dkrz.de/shared/esmvaltool/v2.8.0rc2/ ACCESS1-0 Figure from my own test with the new Core, reduced version of the recipe (/work/bd1083/b380216/output/recipe_deangelis15nat_20230323_172923/): ACCESS1-0

valeriupredoi commented 1 year ago

@katjaweigel have you recreated the environment to pull in all the dependencies the testing environment used?

katjaweigel commented 1 year ago

@valeriupredoi Thanks, you are right: I installed the new environment, but I forgot to turn it on, sorry!

katjaweigel commented 1 year ago

I made a issue (#3132) and a PR (#3133) now to change the plot issues in recipe_deangelis15nat and recipe_cmug_h2o (both are really small changes).

valeriupredoi commented 1 year ago

@katjaweigel that's brilliant, very many thanks, I'll have a look in a jiffy 🍺

remi-kazeroni commented 1 year ago

I made a issue (#3132) and a PR (#3133) now to change the plot issues in recipe_deangelis15nat and recipe_cmug_h2o (both are really small changes).

Thanks for that @katjaweigel. The new runs (and new plots) are available on the same website: https://esmvaltool.dkrz.de/shared/esmvaltool/v2.8.0rc2/

katjaweigel commented 1 year ago

Thanks a lot @remi-kazeroni and @valeriupredoi!

remi-kazeroni commented 1 year ago

Thanks everyone for checking the recipe results, that was very helpful for the release management team 👍 I see that about 2/3 of the recipes were checked and approved which is good enough to proceed with the release of ESMValTool v2.8.0. I'm closing this issue now. Nevertheless, feel free to continue checking recipe output later on and mark those that were checked. If needed, a new issue can be opened to document potential problems noticed later on.

bouweandela commented 1 year ago

Hi @remi-kazeroni, thanks for the nice overview. I noticed that several recipes that are not checked in the list above are listed as OK in the comparison tool output that you posted. Is this on purpose? For example:

..
recipe_combined_indices.yml: OK
..
recipe_consecdrydays.yml: OK
..
remi-kazeroni commented 1 year ago

Hi @bouweandela, I overlooked that and did not put any [x] for the 27 recipes that were reported as unchanged by the comparison tool. I can still do that if you like. My experience is that it would still be better that someone quickly checks the output manually. We have seen problems that went unnoticed from release to release (like masking of 0s) and the comparison tool would report that results have not changed since the past release...

bouweandela commented 1 year ago

We have seen problems that went unnoticed from release to release (like masking of 0s)

That sounds like a serious issue with the comparison tool. Is it reported somewhere? The whole point of having a comparison tool is that you can rely on things being OK if it says they are OK.

remi-kazeroni commented 1 year ago

We have seen problems that went unnoticed from release to release (like masking of 0s)

That sounds like a serious issue with the comparison tool. Is it reported somewhere? The whole point of having a comparison tool is that you can rely on things being OK if it says they are OK.

This was fixed in https://github.com/ESMValGroup/ESMValCore/pull/1823 and is in the v2.8.0 release. I think the point I'm trying to make is: we do not have a robust mechanism in place to record "known good output" for recipes merged into main. If we compare recipe output affected by unnoticed bugs (like masking of 0s) or if outputs change because of some improvements (e.g. 1609), we would somehow need to record the "known good output" again. As long as this is not in place (maybe one day as part of a recipe test workflow), I would not fully rely on the OK from the comparison tool because there could be some uncertainty in the "known good output". That is why I personally feel it is safer to take a look at the final recipe results for a release. Nevertheless, the comparison tool has been very useful for me in various cases: comparing output between rcs, review of some PRs, ...

bouweandela commented 1 year ago

Would you say that the recipes with a checkmark above are known good output then? It would be good to take this to the tech lead meeting.

remi-kazeroni commented 1 year ago

Would you say that the recipes with a checkmark above are known good output then? It would be good to take this to the tech lead meeting.

After a release with quite a few important enhancements and bugfixes, I think yes. Known good output would be those with a checkmark. Maybe it is not necessary that all recipe output are checked after each release, but just once in a while (once per year?) or if the Tech Lead Team says that there would be good reasons (major Core changes) to justify that.