ESMValGroup / ESMValTool

ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP
https://www.esmvaltool.org
Apache License 2.0
217 stars 126 forks source link

Prepare release 2.4.0 #2354

Closed zklaus closed 2 years ago

zklaus commented 2 years ago

Dear @ESMValGroup/esmvaltool-developmentteam,

version 2.4.0 of ESMValTool is approaching fast.

On Monday, we released ESMValCore 2.4.0rc1, the release candidate of the core that this ESMValTool release will be based on. Today we are entering the feature freeze for ESMValTool 2.4.0, which means that no new features will be merged anymore, restricting further changes to bugfixes.

Please help us testing the new version by running your own recipes with the latest development version.

Special attention

One particular change in ESMValCore requires attention from all recipe maintainers. In ESMValGroup/ESMValCore#1332 a new required field was added to recipes. Please check #2324 and make sure that the recipes that you are maintaining get a title.

The release is planned for Monday next week. A few days of delay are possible.

valeriupredoi commented 2 years ago

Cheers a-much Klaus! For adding titles in recipes, the PRs should be assigned to @zklaus and @valeriupredoi for technical review and merge asap (if you are unsure about the choice of title, add someone that might help with it, but let's not dillydoll too much and if a title is good enough that'll do) :beer:

hb326 commented 2 years ago

I encountered a problem with the not-available-anymore "write_netcdf" switch in the version v2.4. I pulled the latest version of the Core branch "main", and also the latest Tool branch "main". I installed everything in delevopment mode. The switch "write_netcdf" is for me still available in "ESMValTool-public/esmvaltool/diag_scripts/shared/_base.py", in line 130 and 520. Did I install something wrong again, or is this a bug that needs to be fixed for the next release?

hb326 commented 2 years ago

Ok, seems like I was wrong and the instances of "write_netcdf" are indeed not present anymore in the "main" branch of the ESMValTool. I will have to check what I did...

bouweandela commented 2 years ago

I can have a go at running those recipes that have a title (currently 64 out of 113) on Mistral using our cylc suite. I'll report back here when I have some results.

remi-kazeroni commented 2 years ago

What would be the timeline to finish including the titles to recipes? There would be many recipes failing at the moment. We can make a list in #2324 to see where we stand.

valeriupredoi commented 2 years ago

cheers muchly @bouweandela :beer: @hb326 is eveything working now? It would be good to have PRs with titles soon (I don't want to use "ASAP" coz that's a bit corporate, but ASAP is a good term for how soon we need those titles for testing) :beer:

valeriupredoi commented 2 years ago

What would be the timeline to finish including the titles to recipes? There would be many recipes failing at the moment. We can make a list in #2324 to see where we stand.

I'll make one now, Remi!

zklaus commented 2 years ago

Fyi, I am already in the process of producing a list of those per maintainer.

bouweandela commented 2 years ago

Here's a small tool for estimating how much data needs to be downloaded to run a bunch of recipes:

"""Show how much data needs to be downloaded for running a list of recipes."""
import argparse
import logging
from pathlib import Path

import esmvalcore._recipe
from esmvalcore._config import read_config_user_file
from esmvalcore._recipe import read_recipe_file
from esmvalcore.esgf import download
from humanfriendly import format_size

def main():
    """Run the program."""
    parser = argparse.ArgumentParser(description=__doc__)
    parser.add_argument('recipes', nargs='+', help='A list of recipes.')
    parser.add_argument('--config-user',
                        default=Path.home() / '.esmvaltool' /
                        'config-user.yml',
                        type=Path,
                        help='Path to config-user.yml.')
    parser.add_argument('--download',
                        action='store_true',
                        help='Download data.')
    args = parser.parse_args()

    def _read_attributes(filename):
        return {}
    esmvalcore._recipe._read_attributes = _read_attributes

    files = set()
    recipes = []
    errors = []
    for filename in args.recipes:
        print("Reading recipe", filename)
        cfg = read_config_user_file(args.config_user, Path(filename).stem)
        try:
            recipe = read_recipe_file(filename, cfg)
        except Exception as exc:
            print(f"Unable to run {filename} because of {exc}")
            errors.append([filename, exc])
        else:
            files.update(recipe._download_files)
            size = sum(f.size for f in recipe._download_files)
            recipes.append([filename, size])

    print("List of recipes that cannot be run:")
    for recipe, error in sorted(errors):
        print(f"{recipe}:")
        print(f"{type(error).__name__}: {error}" + "\n")

    print("List of working recipes:")
    print("\n".join(f"{format_size(size)}\t{recipe}"
                    for (recipe, size) in sorted(recipes)))
    total_size = sum(f.size for f in files)
    print("Total amount of data that needs to be downloaded from ESGF:",
          format_size(total_size))
    if args.download:
        logging.getLogger().setLevel('INFO')
        download(files, dest_folder=cfg['download_dir'])

if __name__ == '__main__':
    main()
hb326 commented 2 years ago

cheers muchly @bouweandela 🍺 @hb326 is eveything working now? It would be good to have PRs with titles soon (I don't want to use "ASAP" coz that's a bit corporate, but ASAP is a good term for how soon we need those titles for testing) 🍺

Yes, all working. I obviously did not merge the latest main branch well enough in my working branch...

valeriupredoi commented 2 years ago

yeah, the need to merge main bit even me, a seasoned veteran, this week (and falsley alarmed poor @zklaus the pin on iris in Core don't work) :laughing:

bouweandela commented 2 years ago

The results of running all recipes that could be successfully read are available here: https://esmvaltool.cloud.dkrz.de/shared/esmvaltool/v2.4.0-test/.

This run was done on the mistral prepost nodes with max_parallel_tasks = 8, a maximum runtime of 4 hours, and a memory limit of 128GB. There were two recipes that were cancelled because they took longer to run than the 4 hours time limit( recipe_bock20jgr_fig_6-7.yml and recipe_wenzel16jclim.yml) and two recipes that were cancelled because they exceeded the memory limit of 128 GB (recipe_smpi_4cds.yml and recipe_eyring13jgr_12.yml). If anyone can share what reasonable runtimes/memory uses are for these recipes that would be great.

Below is a list of recipes that couldn't be run and the problem was already found when reading the recipe:

./bock20jgr/recipe_bock20jgr_fig_1-4.yml: RecipeError: Could not create all tasks ./cmorizers/recipe_daily_era5.yml: YamaleError: Error validating data './cmorizers/recipe_daily_era5.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./cmorizers/recipe_era5-land.yml: YamaleError: Error validating data './cmorizers/recipe_era5-land.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./examples/recipe_check_obs.yml: RecipeError: Could not create all tasks ./hydrology/recipe_globwat.yml: RecipeError: Could not create all tasks ./hydrology/recipe_hydro_forcing.yml: RecipeError: Could not create all tasks ./hydrology/recipe_hype.yml: RecipeError: Could not create all tasks ./hydrology/recipe_marrmot.yml: RecipeError: Could not create all tasks ./hydrology/recipe_pcrglobwb.yml: RecipeError: Could not create all tasks ./hydrology/recipe_wflow.yml: RecipeError: Could not create all tasks ./recipe_albedolandcover.yml: YamaleError: Error validating data './recipe_albedolandcover.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_anav13jclim.yml: RecipeError: Could not create all tasks ./recipe_arctic_ocean.yml: YamaleError: Error validating data './recipe_arctic_ocean.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_carvalhais14nat.yml: YamaleError: Error validating data './recipe_carvalhais14nat.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_climwip_brunner2019_med.yml: RecipeError: Could not create all tasks ./recipe_climwip_brunner20esd.yml: RecipeError: Could not create all tasks ./recipe_collins13ipcc.yml: YamaleError: Error validating data './recipe_collins13ipcc.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_consecdrydays.yml: YamaleError: Error validating data './recipe_consecdrydays.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_cox18nature.yml: YamaleError: Error validating data './recipe_cox18nature.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_cvdp.yml: YamaleError: Error validating data './recipe_cvdp.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_deangelis15nat.yml: YamaleError: Error validating data './recipe_deangelis15nat.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_ecs.yml: YamaleError: Error validating data './recipe_ecs.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_ecs_constraints.yml: YamaleError: Error validating data './recipe_ecs_constraints.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_ensclus.yml: YamaleError: Error validating data './recipe_ensclus.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_extreme_events.yml: YamaleError: Error validating data './recipe_extreme_events.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_flato13ipcc.yml: RecipeError: Could not create all tasks ./recipe_gier2020bg.yml: YamaleError: Error validating data './recipe_gier2020bg.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_hyint.yml: YamaleError: Error validating data './recipe_hyint.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_hyint_extreme_events.yml: YamaleError: Error validating data './recipe_hyint_extreme_events.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_impact.yml: RecipeError: Could not create all tasks ./recipe_landcover.yml: RecipeError: Could not create all tasks ./recipe_li17natcc.yml: YamaleError: Error validating data './recipe_li17natcc.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_martin18grl.yml: YamaleError: Error validating data './recipe_martin18grl.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_meehl20sciadv.yml: YamaleError: Error validating data './recipe_meehl20sciadv.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_miles_block.yml: YamaleError: Error validating data './recipe_miles_block.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_miles_eof.yml: YamaleError: Error validating data './recipe_miles_eof.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_miles_regimes.yml: YamaleError: Error validating data './recipe_miles_regimes.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_ocean_Landschuetzer2016.yml: YamaleError: Error validating data './recipe_ocean_Landschuetzer2016.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_ocean_multimap.yml: YamaleError: Error validating data './recipe_ocean_multimap.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_perfmetrics_CMIP5.yml: RecipeError: Could not create all tasks ./recipe_perfmetrics_CMIP5_4cds.yml: RecipeError: Could not create all tasks ./recipe_perfmetrics_land_CMIP5.yml: RecipeError: Could not create all tasks ./recipe_pv_capacity_factor.yml: YamaleError: Error validating data './recipe_pv_capacity_factor.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_quantilebias.yml: YamaleError: Error validating data './recipe_quantilebias.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_rainfarm.yml: YamaleError: Error validating data './recipe_rainfarm.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_runoff_et.yml: YamaleError: Error validating data './recipe_runoff_et.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_russell18jgr.yml: YamaleError: Error validating data './recipe_russell18jgr.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_schlund20esd.yml: YamaleError: Error validating data './recipe_schlund20esd.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_seaice.yml: RecipeError: Could not create all tasks ./recipe_shapeselect.yml: YamaleError: Error validating data './recipe_shapeselect.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_spei.yml: YamaleError: Error validating data './recipe_spei.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_tcr.yml: YamaleError: Error validating data './recipe_tcr.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_toymodel.yml: YamaleError: Error validating data './recipe_toymodel.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./recipe_wenzel14jgr.yml: RecipeError: Could not create all tasks ./recipe_wenzel16nat.yml: RecipeError: Could not create all tasks ./recipe_williams09climdyn_CREM.yml: RecipeError: Could not create all tasks ./schlund20jgr/recipe_schlund20jgr_gpp_abs_rcp85.yml: YamaleError: Error validating data './schlund20jgr/recipe_schlund20jgr_gpp_abs_rcp85.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./schlund20jgr/recipe_schlund20jgr_gpp_change_1pct.yml: YamaleError: Error validating data './schlund20jgr/recipe_schlund20jgr_gpp_change_1pct.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing ./schlund20jgr/recipe_schlund20jgr_gpp_change_rcp85.yml: YamaleError: Error validating data './schlund20jgr/recipe_schlund20jgr_gpp_change_rcp85.yml' with schema '/mnt/lustre01/pf/b/b381141/src/esmvalgroup/esmvalcore/esmvalcore/recipe_schema.yml' documentation.title: Required field missing

the list was created using this script https://github.com/ESMValGroup/ESMValTool/issues/2354#issuecomment-949493009. See recipe_read_log.txt for the full error messages.

@remi-kazeroni It appears that some problems are caused by missing auxiliary data on Mistral, could you have a look please?

remi-kazeroni commented 2 years ago

The results of running all recipes that could be successfully read are available here: https://esmvaltool.cloud.dkrz.de/shared/esmvaltool/v2.4.0-test/.

Thanks a lot for the tests @bouweandela! Among the runs listed in the website, I noticed 2 recurring errors:

fserva commented 2 years ago

Thanks for the notification @remi-kazeroni, I removed the flags earlier this year with another PR for zmnam https://github.com/ESMValGroup/ESMValTool/pull/2230#issuecomment-879788573, still under review. Should I open a new dedicated PR for this issue only? Thanks

zklaus commented 2 years ago

@fserva, if I read the other PR correctly, the write_plots issue is only a single line change, whereas the full PR is still under review and rather substantial. I would recommend you open a new PR to address the write_plots only.

Thanks!

bettina-gier commented 2 years ago

I've got the same issues with the Argument type mismatch as @remi-kazeroni when trying to run some recipes with ncl diagnostics. When investigating I found that the time coordinate for multi-model means from the preprocessor is set to int64, and the cd_calendar function inbuilt into ncl then gives the Argment type mismatch error. Regular models time coordinates are all set to double. Is there any reason why the mmm time coordinate is set to int64? Otherwise we should change it to double as this would break nearly every ncl diagnostic using mmms.

valeriupredoi commented 2 years ago

the CMOR standard is coordinate points should be float64 and variable's data points should be float32 (to save space)

zklaus commented 2 years ago

@valeriupredoi are you sure? I see mostly floats, not ints.

valeriupredoi commented 2 years ago

crap! I wrote int - I meant float

zklaus commented 2 years ago

I've got the same issues with the Argument type mismatch as @remi-kazeroni when trying to run some recipes with ncl diagnostics. When investigating I found that the time coordinate for multi-model means from the preprocessor is set to int64, and the cd_calendar function inbuilt into ncl then gives the Argment type mismatch error. Regular models time coordinates are all set to double. Is there any reason why the mmm time coordinate is set to int64? Otherwise we should change it to double as this would break nearly every ncl diagnostic using mmms.

Thanks for the report, @bettina-gier. I don't think they should be int64. I'll have a look and open an issue as necessary.

remi-kazeroni commented 2 years ago

@remi-kazeroni It appears that some problems are caused by missing auxiliary data on Mistral, could you have a look please?

Apart from the autoassess recipes which are a special case (see https://github.com/ESMValGroup/ESMValTool/issues/2309), I only found one case with missing auxiliary data: recipe_climwip_brunner2019_med.yml for which I added the missing files (but couldn't run the recipe successfully due to missing CMIP5 data). Did you have other cases in mind that I may have missed?

Below is a list of recipes that couldn't be run and the problem was already found when reading the recipe

Most of the failed runs (apart from the missing titles) are due to missing CMIP5 data and then to missing ERA5 data (mostly for the hydrology recipes). Are you planning on retrying to run these recipes with the automatic download on? I'm not sure the amount of data downloaded would be that large but you could download these on the scratch disk of Mistral. That could help the maintainers of recipes to see which data are no longer available on ESGF nodes and should be removed from the recipes.

bettina-gier commented 2 years ago

@remi-kazeroni It appears that some problems are caused by missing auxiliary data on Mistral, could you have a look please?

Apart from the autoassess recipes which are a special case (see #2309), I only found one case with missing auxiliary data: recipe_climwip_brunner2019_med.yml for which I added the missing files (but couldn't run the recipe successfully due to missing CMIP5 data). Did you have other cases in mind that I may have missed?

Missing the aux for my recipe: https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/recipes/recipe_gier2020bg.yml https://thredds.daac.ornl.gov/thredds/catalog/ornldaac/968/catalog.html?dataset=968/Land_Cover_Class_1degree.nc4

Bouwes script couldn't find that yet cause my recipe is still missing a title - I'll add that once the other issue is solved because I wanna test the whole thing.

remi-kazeroni commented 2 years ago

Missing the aux for my recipe: https://github.com/ESMValGroup/ESMValTool/blob/main/esmvaltool/recipes/recipe_gier2020bg.yml https://thredds.daac.ornl.gov/thredds/catalog/ornldaac/968/catalog.html?dataset=968/Land_Cover_Class_1degree.nc4

Bouwes script couldn't find that yet cause my recipe is still missing a title - I'll add that once the other issue is solved because I wanna test the whole thing.

The file can now be found in: /mnt/lustre02/work/bd0854/DATA/ESMValTool2/AUX/Land_Cover_Class_1degree.nc4. We may need to think of a more efficient way to centrally store the auxiliary data everytime a new recipe is added to the Tool.

bouweandela commented 2 years ago

Most of the failed runs (apart from the missing titles) are due to missing CMIP5 data and then to missing ERA5 data (mostly for the hydrology recipes). Are you planning on retrying to run these recipes with the automatic download on?

@remi-kazeroni These runs were already done with automatic download. So if there is still missing CMIP5 data, I see 3 possible causes: 1 i made a mistake, 2 the data has been retracted, 3 there is a bug in esmvalcore. Could you have look? I will only be back at work on Thursday.

I'm not sure the amount of data downloaded would be that large

Indeed, I only needed to download about 8GB (though I aleady downloaded more data earlier for recipe_impact.yml).

remi-kazeroni commented 2 years ago

This run was done on the mistral prepost nodes with max_parallel_tasks = 8, a maximum runtime of 4 hours, and a memory limit of 128GB. There were two recipes that were cancelled because they took longer to run than the 4 hours time limit( recipe_bock20jgr_fig_6-7.yml and recipe_wenzel16jclim.yml) and two recipes that were cancelled because they exceeded the memory limit of 128 GB (recipe_smpi_4cds.yml and recipe_eyring13jgr_12.yml). If anyone can share what reasonable runtimes/memory uses are for these recipes that would be great.

For the recipe_eyring13jgr_12.yml, it was successfully tested by @LisaBock using a full prepost node on Mistral:

Time for running the recipe was: 2:56:37.235106
Maximum memory used (estimate): 174.8 GB
valeriupredoi commented 2 years ago

Maximum memory used (estimate): 174.8 GB

Surely you're joking Mr Feynman :rofl: Man, that preprocessor set needs optimization like there's no tomorrow, where does one even find 200GB of RAM?

remi-kazeroni commented 2 years ago

@remi-kazeroni These runs were already done with automatic download. So if there is still missing CMIP5 data, I see 3 possible causes: 1 i made a mistake, 2 the data has been retracted, 3 there is a bug in esmvalcore. Could you have look? I will only be back at work on Thursday.

@bouweandela, is there any way I could check the log files for these runs? I think one problem may be that the default ESGF node to download data (esgf-node.llnl.gov) is down, as mentionned in #1370. I made a test by running the examples/recipe_python.yml using the automatic download feature on a machine with no data. It failed with the default esgf-pyclient features but the recipe ran fine after selecting another node, e.g.:

search_connection:
  url: "http://esgf-data.dkrz.de/esg-search"

Nevertheless, I think that missing data are downloaded only if everything is available on ESGF nodes which may not be the case for many of the failing recipes. Could you perhaps rerun the failing recipes using another search_connection to be sure?

LisaBock commented 2 years ago

This run was done on the mistral prepost nodes with max_parallel_tasks = 8, a maximum runtime of 4 hours, and a memory limit of 128GB. There were two recipes that were cancelled because they took longer to run than the 4 hours time limit( recipe_bock20jgr_fig_6-7.yml and recipe_wenzel16jclim.yml) and two recipes that were cancelled because they exceeded the memory limit of 128 GB (recipe_smpi_4cds.yml and recipe_eyring13jgr_12.yml). If anyone can share what reasonable runtimes/memory uses are for these recipes that would be great.

I also just tested successfully the recipe_bock20jgr_fig_6-7.yml on the prepost partition on Mistral with:

Time for running the recipe was: 7:38:04.693573
Maximum memory used (estimate): 36.2 GB

(As it needed CMIP3 data you could not use parallel computing.)

valeriupredoi commented 2 years ago

I am trying to run @bettina-gier 's _gierxxx recipe - no chance to have it done on JASMIN because I am missing all the OBS's and - and am quite crossed about this - huge chunks of CMIP6 data and some CMIP5 too - so far I have accumulated 17G of ESGF downloads that cover missing CMIP data on CEDA's ESGF node. This is a joke :angry:

valeriupredoi commented 2 years ago

what's that prepost node you guys talking about? Data gets shipped there before the actual ESGF posting?

remi-kazeroni commented 2 years ago

Maximum memory used (estimate): 174.8 GB

Surely you're joking Mr Feynman 🤣 Man, that preprocessor set needs optimization like there's no tomorrow, where does one even find 200GB of RAM?

That recipe does a multi model mean on a 3D variable over about 50 datasets. On Mistral, the largest nodes (prepost) have 256 GB of memory and can be used exclusively by users.

bettina-gier commented 2 years ago

I am trying to run @bettina-gier 's _gierxxx recipe - no chance to have it done on JASMIN because I am missing all the OBS's and - and am quite crossed about this - huge chunks of CMIP6 data and some CMIP5 too - so far I have accumulated 17G of ESGF downloads that cover missing CMIP data on CEDA's ESGF node. This is a joke angry

ah yeah don't try to run that, there was a bunch of data that I had to get manually, I was planning to comment the ones out that aren't on ESGF but not completely remove to resemble what was used in the paper. Though on DKRZ I wasn't missing CMIP5 data, interesting. but half the CMIP6 wasn't on the DKRZ folders.. fun.

If you wanna see the same issue, you can also try running the recipe_perfmetrics_land_CMIP5.yml . Minimal way to recreate this error in ncl code (this example from the nbp diag from perfmetrics_land):

data = addfile("MultiModelMean_Lmon_nbp_1980-1999.nc", "r")
test = cd_calendar(data&time, 0)

-> fatal:Argument type mismatch on argument (0) of (cd_calendar) can not coerce Whereas this works:

data2 = addfile("CMIP5_MRI-ESM1_Lmon_historical_r1i1p1_nbp_1980-1999.nc", "r")
test2 = cd_calendar(data2&time, 0) 
ncdump -h MultiModelMean_Lmon_nbp_1980-1999.nc
netcdf MultiModelMean_Lmon_nbp_1980-1999 {
dimensions:
        time = 240 ;
        lat = 90 ;
        lon = 180 ;
        bnds = 2 ;
variables:
        float nbp(time, lat, lon) ;
                nbp:_FillValue = 1.e+20f ;
                nbp:standard_name = "surface_net_downward_mass_flux_of_carbon_dioxide_expressed_as_carbon_due_to_all_land_processes" ;
                nbp:long_name = "Carbon Mass Flux out of Atmosphere due to Net Biospheric Production on Land" ;
                nbp:units = "kg m-2 s-1" ;
                nbp:cell_methods = "multi-model: mean" ;
        int64 time(time) ;
                time:axis = "T" ;
                time:bounds = "time_bnds" ;
                time:units = "days since 1850-01-01" ;
                time:standard_name = "time" ;
                time:calendar = "gregorian" ;
        double time_bnds(time, bnds) ;
...
ncdump -h CMIP5_MRI-ESM1_Lmon_historical_r1i1p1_nbp_1980-1999.nc
netcdf CMIP5_MRI-ESM1_Lmon_historical_r1i1p1_nbp_1980-1999 {
dimensions:
        time = 240 ;
        lat = 90 ;
        lon = 180 ;
        bnds = 2 ;
variables:
        float nbp(time, lat, lon) ;
                nbp:_FillValue = 1.e+20f ;
                nbp:standard_name = "surface_net_downward_mass_flux_of_carbon_dioxide_expressed_as_carbon_due_to_all_land_processes" ;
                nbp:long_name = "Carbon Mass Flux out of Atmosphere due to Net Biospheric Production on Land" ;
                nbp:units = "kg m-2 s-1" ;
                nbp:cell_methods = "time: mean (interval: 1 day) area: mean where land" ;
        double time(time) ;
                time:axis = "T" ;
                time:bounds = "time_bnds" ;
                time:units = "days since 1850-1-1 00:00:00" ;
                time:standard_name = "time" ;
                time:long_name = "time" ;
                time:calendar = "gregorian" ;
...
jvegreg commented 2 years ago

Maximum memory used (estimate): 174.8 GB

Surely you're joking Mr Feynman 🤣 Man, that preprocessor set needs optimization like there's no tomorrow, where does one even find 200GB of RAM?

That recipe does a multi model mean on a 3D variable over about 50 datasets. On Mistral, the largest nodes (prepost) have 256 GB of memory and can be used exclusively by users.

I will just left this here:

2021-10-22 01:28:01,828 UTC [11220] INFO    Ending the Earth System Model Evaluation Tool v2.3.1 at time: 2021-10-22 01:28:01 UTC
2021-10-22 01:28:01,828 UTC [11220] INFO    Time for running the recipe was: 15:38:15.525013
2021-10-22 01:28:02,357 UTC [11220] INFO    Maximum memory used (estimate): 1129.3 GB
2021-10-22 01:28:02,358 UTC [11220] INFO    Sampled every second. It may be inaccurate if short but high spikes in memory consumption
occur.
2021-10-22 01:28:02,364 UTC [11220] INFO    Run was successful

This is a multimodel of around five models on ERA 5 3D grid

zklaus commented 2 years ago

what's that prepost node you guys talking about? Data gets shipped there before the actual ESGF posting?

@valeriupredoi, you did read the fine manual, right?

zklaus commented 2 years ago

@bettina-gier, I tracked the time int64 issue to a change in cftime. Further updates in the discussion at https://github.com/ESMValGroup/ESMValTool/discussions/2380#discussioncomment-1532530

valeriupredoi commented 2 years ago

@jvegasbsc how was Policia Nacional not notified that you have attempted to steal as much RAM as the guys in Casa de Papel tried with Euros :rofl:

Anyways, heads up I checked Github Actions and all is fine with the new rc2 for Core that @zklaus released last night - all dependencies seem to be in order and tests run biutifully! Note, however, that we are still testing against 2.3.1 from pip on the Core side of tests in Github Actions, but that's OK, pip is never grabbing the RC when asked to install a package (unless that package is required as dependency, as is the case for Tool)

bouweandela commented 2 years ago

I'm doing another run of all the recipes, will post the results here once they are in.

bouweandela commented 2 years ago

I think one problem may be that the default ESGF node to download data (esgf-node.llnl.gov) is down

@remi-kazeroni If the search node is down, the esmvaltool run will fail with an error message saying that. It looks like the missing data problems are diverse and need to be investigated per recipe. My first impression is that 1) people may have manually downloaded and/or copied the fx files to a place where the tool was able to find it in several cases, to work around the issue described in https://github.com/ESMValGroup/ESMValCore/issues/1138#issuecomment-844283263. 2) In other cases, the facets may be incorrectly set on ESGF. For example

from esmvalcore.esgf import files
find_files(**{'mip': 'OImon', 'project': 'CMIP5', 'exp': 'rcp85', 'short_name': 'sic', 'dataset': 'NorESM1-ME'})

gives

[ESGFFile:cmip5/output1/NCC/NorESM1-ME/rcp85/mon/seaIce/OImon/r1i1p1/v20130926/sic_OImon_NorESM1-ME_rcp85_r1i1p1_200601-204412.nc on hosts ['esgf.nci.org.au'],
 ESGFFile:cmip5/output1/NCC/NorESM1-ME/rcp85/mon/seaIce/OImon/r1i1p1/v20130926/sic_OImon_NorESM1-ME_rcp85_r1i1p1_204501-210012.nc on hosts ['esgf.nci.org.au']]

while no results are found if you add the facet product='output1'. 3) Data that was once available is no longer available on ESGF, for example CMIP5 model CESM1-CAM5-1-FV2 appears to have no monthly historical tas data anymore. Question: is there a central place where you can find out if some CMIP5 data was retracted?

bouweandela commented 2 years ago

The results of the new run are available here: https://esmvaltool.cloud.dkrz.de/shared/esmvaltool/v2.4.0-test2/

53 recipes ran successfully 56 recipes failed

@ESMValGroup/esmvaltool-recipe-maintainers Please check that your recipes ran successfully now and see if you need to correct anything before the release. Note that if the recipe ran successfully, but you are missing plots or files on the resulting webpage, provenance has not been implemented in the diagnostic script for those files.

The run was done with the same settings as last time https://github.com/ESMValGroup/ESMValTool/issues/2354#issuecomment-950282432. This time I ran all recipes, except the four (recipe_bock20jgr_fig_6-7.yml, recipe_wenzel16jclim.yml, recipe_smpi_4cds.yml and recipe_eyring13jgr_12.yml) that didn't run with these settings.

katjaweigel commented 2 years ago

@bouweandela thanks for the test! I try to get recipe_flato13ipcc.yml running, see new comment in the (closed) PR #2390 I think in the test here the issue is missing data (I removed several models to make it run to the point it does now because I had a similar error as in the test), but as soon as this is solved it would run into the issue mentioned in PR #2390. For esmvaltool/recipes/recipe_pv_capacity_factor.yml: can you give read access for the log file main_log.debug points to: /pf/b/b381141/esmvaltool_output/recipe_pv_capacity_factor_20211028_142540/run/capacity_factor/main/log.txt ? I tested it some days ago for a near final version of the new core and it worked well, the issue here could be related to missing data.

katjaweigel commented 2 years ago

The results for recipe_deangelis15nat, recipe_li17natcc, recipe_martin18grl look as expected. (I actually need to do a bug fix for one of the plot types in recipe_martin18grl but this is not related to the release and I just didn't find the time, yet.)

remi-kazeroni commented 2 years ago

Thanks for testing the recipes @bouweandela! It seems that data downloading worked better this time. Out of curiosity, have you checked the amount of data downloaded?

I went through all failed recipes to see the problems. Investigations are summarized in the table below. Please feel free to link to open issues/Prs for known issues

Wontfix Issues

These issues will not be fixed in this release. They are generally due to data that has become unavailable or is not yet fully integrated into ESMValTool.

#### Missing data | Recipe | Reason of failure | Known problem? | Current Status | |--------|-------------------|----------------|----------------| | recipe_anav13jclim | missing CMIP5 data |retracted data|won't fix| | recipe_schlund20jgr_gpp_abs_rcp85 | missing CMIP5 data |retracted data|won't fix| | recipe_schlund20jgr_gpp_change_1pct | missing CMIP5 data |retracted data|won't fix| | recipe_schlund20jgr_gpp_change_rcp85 | missing CMIP5 data |retracted data|won't fix| #### Other data problems | Recipe | Reason of failure | Known problem? | Current Status | |--------|-------------------|----------------|----------------| | recipe_autoassess_radiation_rms_cfMon_all | problem with the variable clisccp | obs4MIPs table obsolete| ESMValGroup/ESMValCore#1238 this will be bumped to 2.5 | #### Other problems | Recipe | Reason of failure | Known problem? | Current Status | |--------|-------------------|----------------|----------------| | recipe_autoassess_landsurface_soilmoisture | hardcoded path to auxiliary data on Jasmin || #2309 needs resolution |

Outstanding Issues

Missing data

ERA5
Recipe Missing variables Known problem? Current Status
recipe_check_obs rlns, rlus, rsns, rsus, uas, vas #2396
recipe_climwip_brunner2019_med rsns
recipe_globwat all ERA5 data needed not available
recipe_hydro_forcing all ERA5 data needed not available
recipe_hype missing ERA5 data (wrong version label?)
recipe_marrmot all ERA5 data needed not available
recipe_pcrglobwb all ERA5 data needed not available
recipe_wflow all ERA5 data needed not available
CMIP5
Recipe Reason of failure Known problem? Current Status
recipe_collins13ipcc missing CMIP5 data
recipe_flato13ipcc missing CMIP5 data Data is probably not the only issue see comment in #2390, testing a fix
recipe_gier2020bg missing CMIP5, CMIP6, OBS data
recipe_impact missing CMIP5 data
recipe_landcover missing CMIP5 data (CMIP5_inmcm4)
recipeperfmetrics* missing CMIP5 data
recipe_seaice missing CMIP5 data
recipe_wenzel14jgr missing CMIP5 data
recipe_wenzel16nat missing CMIP5 data
recipe_williams09climdyn_CREM missing CMIP5 data
Other
Recipe Reason of failure Known problem? Current Status
recipe_climwip_brunner20esd missing CMIP6 ssp585 data
recipe_gier2020bg missing CMIP5, CMIP6, OBS data

Provenance problems

Recipe Reason of failure Known problem? Current Status
recipe_thermodyn_diagtool diagnostic error (attempt to record provenance twice?)

Resolved Issues

#### `write_plots`, `write_netcdf` | Recipe | Reason of failure | Known problem? | Current Status | |--------|-------------------|----------------|----------------| | recipe_carvalhais14nat | diagnostic using `write_plots` || #2394| | recipe_ensclus | diagnostic using `write_plots` || #2394 | | recipe_extreme_events | R diagnostic using `write_plots` ||#2395| | recipe_hyint_extreme_events | R diagnostic using `write_plots` ||#2395| | recipe_kcs | diagnostic using `write_plots` || #2394 | | recipe_miles_* | R diagnostic using `write_plots` ||#2395| | recipe_ocean_* | diagnostic using `write_plots` || #2393 | | recipe_runoff_et | diagnostic using `write_netcdf` || #2394 | | recipe_seaice_feedback | diagnostic using `write_plots` || #2394 | | recipe_zmnam | diagnostic using `write_plots` || #2394 | #### Julia problems | Recipe | Reason of failure | Known problem? | Current Status | |--------|-------------------|----------------|----------------| | recipe_julia | Julia error in the diagnostic || fix in #2335 | #### Provenance problems | Recipe | Reason of failure | Known problem? | Current Status | |--------|-------------------|----------------|----------------| | recipe_ecs_constraints | log_provenance issue in a diagnostic ||fixed in #2391| | recipe_schlund20esd | log_provenance issue in a diagnostic (same diag as recipe_ecs_constraints) ||fixed in #2391| #### Missing data | Recipe | Reason of failure | Known problem? | Current Status | |--------|-------------------|----------------|----------------| | recipe_bock20jgr_fig_1-4 | missing ERA5 data (wrong version label?) ||one ERA5 datafile added| | recipe_snowalbedo | data download failure |missing CMIP5 data|| #### Other problems | Recipe | Reason of failure | Known problem? | Current Status | |--------|-------------------|----------------|----------------| | recipe_arctic_ocean | diagnostic error ||#2397| | recipe_pv_capacity_factor | diagnostic error (run dir created twice?) |bug|#2392| | recipe_smpi | memory issue? ||ran fine with `max_parallel_task=1`| | recipe_seaice_drift | diagnostic error ||#2404| | recipe_ocean_multimap | diagnostic error |more details in #2398||
remi-kazeroni commented 2 years ago

3) Data that was once available is no longer available on ESGF, for example CMIP5 model CESM1-CAM5-1-FV2 appears to have no monthly historical tas data anymore. Question: is there a central place where you can find out if some CMIP5 data was retracted?

I'd be curious to know as well. Many of the failures for "missing CMIP5 data" occur because of a handful of missing datasets (CESM1-CAM5-1-FV2, CMIP5_inmcm4, ...). It'd be great to have a list of those few problematic datasets and see if we can remove them from the recipes.

But I will now have a look at the missing ERA5 issues because I'm not sure it's related to the Mistral data pool or something wrong in the Core.

schlunma commented 2 years ago

Thanks for testing the recipes, @bouweandela! I opened a PR that fixes recipe_ecs_constraints and recipe_schlund20esd: #2391

Other recipes where I am maintainer fail due to missing data which has been retracted. I won't change these recipes; they run fine when the data is present (just tested this myself).

katjaweigel commented 2 years ago

I think I found the problem in recipe_pv_capacity_factor I don't know why it didn't surface when I tested after I added the title, I'll do a PR soon.

zklaus commented 2 years ago

Thanks, @schlunma for the quick fix on the provenance. How to deal with retracted data is less clear. I had a bit of a chat about that with @bouweandela earlier, but I suggest to accept your solution for the release and address the question at large at upcoming workshop.

bouweandela commented 2 years ago

Out of curiosity, have you checked the amount of data downloaded?

Thanks for the extensive error analysis! About 70 GB

For esmvaltool/recipes/recipe_pv_capacity_factor.yml: can you give read access for the log file main_log.debug points to: /pf/b/b381141/esmvaltool_output/recipe_pv_capacity_factor_20211028_142540/run/capacity_factor/main/log.txt ?

@remi-kazeroni or @zklaus could you please do this? I will be back at work on Thursday.

valeriupredoi commented 2 years ago

@bouweandela cheers for running those and @remi-kazeroni many thanks for creating that table, very handy! The two autoassess recipes that are currently conking up are due to: clisccp needing a fresher obs4MIPs table (cheers @zklaus ) and the other needing those darn files uploaded to Zenodo, but for that I need the OK from @alistairsellar so I don't break any DPR - I think we can safely bump those to 2.5 :grin:

katjaweigel commented 2 years ago
For esmvaltool/recipes/recipe_pv_capacity_factor.yml: can you give read access for the log file main_log.debug points to:
/pf/b/b381141/esmvaltool_output/recipe_pv_capacity_factor_20211028_142540/run/capacity_factor/main/log.txt
?

@remi-kazeroni or @zklaus could you please do this? I will be back at work on Thursday.

Not necessary any more, it is actually possible to access it through: https://esmvaltool.cloud.dkrz.de/shared/esmvaltool/v2.4.0-test2/recipe_pv_capacity_factor_20211028_142540/run/capacity_factor/main/log.txt (but not when I tried to read it on my own Mistral account, sorry!) I also ran it myself and reproduced the issue, see #2392.

remi-kazeroni commented 2 years ago

Thanks @zklaus and @valeriupredoi for fixing the write_plots in the remaining diagnostics 👍 I have started to rerun the corresponding recipes and will post the outcome on Tuesday.

Original issue in https://esmvaltool.cloud.dkrz.de/shared/esmvaltool/v2.4.0-test2/

Missing data

Using write_plots, write_netcdf

Provenance issue

Lack of computational ressources (see comment) - reran on a full prepost node with max_parallel_task=1