Closed bouweandela closed 6 months ago
Unfortunately, the results were written to a scratch disk and left there for too long, resulting in part of them to be deleted. We will need to do a new run so this will take a bit longer.
great stuff, folks! Here's me looking at the odd 17 ones that died of unnatural causes, of various reasons (not data or diag):
/home/b/b381141/esmvaltool_output/esmvaltool-v2.10.x-2023-12-12/recipe_ipccwg1ar6ch3_atmosphere_20231212_212329/preproc/IAV_calc_tas_cmip5/tas/CMIP5_CCSM4_Amon_piControl_r1i1p1_tas_0250-1300.nc
is perfectly fine, prob the same issue as above for collins13recipe_psyplot fails due to an issue of geos
, which will apparantly be fixed in their new version 3.13 (we currently have 3.11).
good find @schlunma :beer: Our env is getting somewhat rather old, we need that Py312 support sooner than later (working on it for Core, stuck at prospector
, the last hurdle)
lauer22_Fig5_lifrac fails to realize data from a (240, 91, 360, 720) array with dtype('float64') - needs be run on a hefty memory node
a lot of those 17 recipes, that failed with other reasons, simply die out - all goes fine until they just stop in their tracks - any info from the SLURM logs? It looks to me like they were just run on an interactive node and the user timed out, with the system killing the session, and implicitly, the running process
I just successfully ran my 3 schlundjgr recipes, updated them in the table.
Regarding the NCL failures: Could you please cherry-pick https://github.com/ESMValGroup/ESMValTool/commit/1647d46ee3c5deb084fa4ac59024a46581d618c3 into the release branch? This needs to be in there (see https://github.com/ESMValGroup/ESMValTool/issues/3420). I just ran wenzel16jclim successfully with the current main
branch.
Regarding the NCL failures: Could you please cherry-pick https://github.com/ESMValGroup/ESMValTool/commit/1647d46ee3c5deb084fa4ac59024a46581d618c3 into the release branch? This needs to be in there (see https://github.com/ESMValGroup/ESMValTool/issues/3420). I just ran wenzel16jclim successfully with the current main branch.
@schlunma I pulled this in, but it is giving me these issues:
$ cat /home/b/b381141/esmvaltool_output/esmvaltool-v2.10.x-2023-12-12/recipe_tebaldi21esd_20231213_193814/run/fig1b/plot_ts_line_mean_spread_pr/log.txt
Copyright (C) 1995-2019 - All Rights Reserved
University Corporation for Atmospheric Research
NCAR Command Language Version 6.6.2
The use of this software is governed by a License Agreement.
See http://www.ncl.ucar.edu/ for more details.
INFO Loading settings from /home/b/b381141/esmvaltool_output/esmvaltool-v2.10.x-2023-12-12/recipe_tebaldi21esd_20231213_193814/run/fig1b/plot_ts_line_mean_spread_pr/settings.ncl
INFO Loading input data description from /home/b/b381141/esmvaltool_output/esmvaltool-v2.10.x-2023-12-12/recipe_tebaldi21esd_20231213_193814/preproc/fig1b/pr/pr_info.ncl
INFO Wrote /home/b/b381141/esmvaltool_output/esmvaltool-v2.10.x-2023-12-12/recipe_tebaldi21esd_20231213_193814/plots/fig1b/plot_ts_line_mean_spread_pr//pr_ts_line_1850_2100.pdf
INFO fatal: in log_provenance (interface_scripts/logging.ncl), outfile (path to figure) '/home/b/b381141/esmvaltool_output/esmvaltool-v2.10.x-2023-12-12/recipe_tebaldi21esd_20231213_193814/plots/fig1b/plot_ts_line_mean_spread_pr//pr_ts_line_1850_2100.pdf' does not exist (for PNGs, this function also searches for 'FILE.000001.png', 'FILE.000002.png', etc.); if no plot file is available use 'n/a'
and
$ cat /home/b/b381141/esmvaltool_output/esmvaltool-v2.10.x-2023-12-12/recipe_collins13ipcc_20231213_193815/run/ts_line_tas/ch12_plot_ts_line_mean_spread_tas/log.txt
Copyright (C) 1995-2019 - All Rights Reserved
University Corporation for Atmospheric Research
NCAR Command Language Version 6.6.2
The use of this software is governed by a License Agreement.
See http://www.ncl.ucar.edu/ for more details.
INFO Loading settings from /home/b/b381141/esmvaltool_output/esmvaltool-v2.10.x-2023-12-12/recipe_collins13ipcc_20231213_193815/run/ts_line_tas/ch12_plot_ts_line_mean_spread_tas/settings.ncl
INFO Loading input data description from /home/b/b381141/esmvaltool_output/esmvaltool-v2.10.x-2023-12-12/recipe_collins13ipcc_20231213_193815/preproc/ts_line_tas/tas/tas_info.ncl
INFO Wrote /home/b/b381141/esmvaltool_output/esmvaltool-v2.10.x-2023-12-12/recipe_collins13ipcc_20231213_193815/plots/ts_line_tas/ch12_plot_ts_line_mean_spread_tas//tas_ts_line_1850_2300.pdf
INFO fatal: in log_provenance (interface_scripts/logging.ncl), outfile (path to figure) '/home/b/b381141/esmvaltool_output/esmvaltool-v2.10.x-2023-12-12/recipe_collins13ipcc_20231213_193815/plots/ts_line_tas/ch12_plot_ts_line_mean_spread_tas//tas_ts_line_1850_2300.pdf' does not exist (for PNGs, this function also searches for 'FILE.000001.png', 'FILE.000002.png', etc.); if no plot file is available use 'n/a'
with output_file_type: png
in config-user.yml. The files with .png extension do exist, but somehow the code appears to be looking for pdf files?
@axel-lauer Could you please copy the file /work/bd0854/DATA/ESMValTool2/download/obs4MIPs/MODIS-1-0/v20180305/clt_mon_MODIS-1-0_BE_gn_200003-201109.nc
to /work/bd0854/DATA/ESMValTool2/OBS/Tier1/MODIS-1-0/
on Levante? That will make it possible to run recipe_clouds_bias.yml and recipe_lauer13jclim again. Unfortunately, fully automatic download from ESGF does not work because the file has outdated facets on ESGF.
The remaining NCL problems are due to a double guessing of the output filename. I think I fixed it locally, will commit soon.
The NCL fix is in #3474.
can one of you pls have a look at Julia? I want to close (and will close) https://github.com/ESMValGroup/ESMValTool/issues/3287 since the Julia looks is the only thing outstanding there (even so, it has an issue to it)
The issue with recipe_julia.yml is still open, I suspect it doesn't correctly handle fill values since the scale is at 1e20.
lemme have a look at Julia then :grin:
ahaa! Figured out Julia! netCDF
package loads missing values as 1e20s, whereas NCDatasets loads them as missing
- will open Draft PR (I don't speak Julia so I couldn't make it work 100%)
I did a re-run of all recipes affected by late changes (i.e. all NCL recipes and a few other recipes with bug/data fixes).
@axel-lauer Could you please copy the file
/work/bd0854/DATA/ESMValTool2/download/obs4MIPs/MODIS-1-0/v20180305/clt_mon_MODIS-1-0_BE_gn_200003-201109.nc
to/work/bd0854/DATA/ESMValTool2/OBS/Tier1/MODIS-1-0/
on Levante?
Done.
Some conclusions based on the above:
Is the output of these tests available somewhere?
Hm. I yesterday ran successfully two of the HDF5 problematic recipes (collins and tebaldi). I also ran successfully wenzel16jclim, though wenzel14jgr continues to fail also for me.
I have not uploaded it yet, but you can access it on Levante: slurm logs: /home/b/b381141/esmvaltool-v2.10.x-2023-12-14-logs esmvaltool output: /work/bd0854/b381141/esmvaltool_output/esmvaltool-v2.10.x-2023-12-12
Here is the HDF error if anyone is interested: stdout:
OSError: [Errno -101] NetCDF: HDF error: '/home/b/b381141/esmvaltool_output/esmvaltool-v2.10.x-2023-12-12/recipe_tebaldi21esd_20231214_174448/preproc/fig6c_IAV/tas/CMIP6_MRI-ESM2-0_Amon_piControl_r1i1p1f1_tas_gn_1850-2150.nc'
stderr:
#000: H5F.c line 836 in H5Fopen(): unable to synchronously open file
major: File accessibility
minor: Unable to open file
#001: H5F.c line 796 in H5F__open_api_common(): unable to open file
major: File accessibility
minor: Unable to open file
#002: H5VLcallback.c line 3863 in H5VL_file_open(): open failed
major: Virtual Object Layer
minor: Can't open object
#003: H5VLcallback.c line 3675 in H5VL__file_open(): open failed
major: Virtual Object Layer
minor: Can't open object
#004: H5VLnative_file.c line 128 in H5VL__native_file_open(): unable to open file
major: File accessibility
minor: Unable to open file
#005: H5Fint.c line 1873 in H5F_open(): unable to lock the file
major: File accessibility
minor: Unable to lock file
#006: H5FD.c line 2034 in H5FD_lock(): driver lock request failed
major: Virtual File Layer
minor: Unable to lock file
#007: H5FDsec2.c line 988 in H5FD__sec2_lock(): unable to lock file, errno = 11, error message = 'Resource temporarily unavailable'
major: Virtual File Layer
minor: Unable to lock file
I messed up the check for file existence in https://github.com/ESMValGroup/ESMValTool/pull/3422, which gives these wrong NCL errors about non-existing plot files. Should be fixed by #3477 (I tested this successfully with wenzel14jgr, but since the other NCL errors are similar I guess those should run, too :crossed_fingers: )
Thanks, running them again now..
H5FDsec2.c line 988 in H5FD__sec2_lock(): unable to lock file, errno = 11, error message = 'Resource temporarily unavailable'
there is I/O toestepping going on - this is a fairly common HDF5 barf if the file is already opened by a process while another process is trying to open and read it/write to it - see eg https://github.com/h5py/h5py/issues/1066 and a bunch of other people complaining about it since years ago. We need to understand what process opens the file and what other process is trying to do the same, but I think that's probably just on SLURM and that's not gonna be a straightforward task. At least, I'd be up for it but not today, and not before Xmas :christmas_tree:
Good news, all NCL recipes with diag failures except for recipe_russel18jgr
run now :tada:
recipe_russel18jgr
fails since some of the diagnostic don't write plots (but judging from the code they are supposed to do that). This has already been the case for the v2.9.0 release, but became evident now due to the changes in the NCL provenance code. I opened an issue here: https://github.com/ESMValGroup/ESMValTool/issues/3478
Since both maintainers of this recipe are not really active anymore, I suggest we flag this recipe as broken. @ESMValGroup/esmvaltool-coreteam opinions?
It would be good, to get it running again, but I guess not worth/possible for the current release (since the missing figures were unnoticed/unreported for quite some time now).
Here is a PR that fixes some issues in the russell recipe: https://github.com/ESMValGroup/ESMValTool/pull/3479. With this, all diagnostics except for the ones listed in #3478 work again.
Here are the results from the comparison with v2.9.
Here is a summary of the comparison results (full comparison is here). @ESMValGroup/esmvaltool-recipe-maintainers and @ESMValGroup/esmvaltool-coreteam If you have a bit of time, please check if the output of these recipes is still correct. Tick the box and add your name behind a recipe once you've checked.
Runs with v2.10: https://esmvaltool.dkrz.de/shared/esmvaltool/v2.10.0/ Runs with v2.9: https://esmvaltool.dkrz.de/shared/esmvaltool/v2.9.0/ Runs with v2.8 https://esmvaltool.dkrz.de/shared/esmvaltool/v2.8.0/ If the plots or data files are not shown on the recipe output webpage (this happens when provenance has not been implemented in the diagnostic script), you can still download them by clicking the 'figures' or 'data' links at the bottom of the page.
The recipes where plots are different are probably the most important to check because if the data are different but the plots still look the same the changes are probably not significant. Maybe we can refine the thresholds for when data is reported as different for a future version of the comparison tool.
Comparison is done using numpy.allclose with the default tolerances for floating point numbers and numpy.array_equal for other data types.
Is it possible to see the result/log of the comparison tool? Visually I don't find any difference in the figures of recipe_martin18grl.yml, but is has a lot of Figures, so I might have missed it.
Yes, they are posted in https://github.com/ESMValGroup/ESMValTool/issues/3463#issuecomment-1859257090.
recipe_martin18grl.yml: results differ from reference run
Reference run: /shared/esmvaltool/v2.9.0/recipe_martin18grl_20230704_162537
Current run: /shared/esmvaltool/v2.10.0/recipe_martin18grl_20231212_223414
Differing files:
- plots/spi_collect/spi_collect/SPI_mapHistoric_Dur_of_Events_ACCESS1-0.png
- plots/spi_collect/spi_collect/SPI_mapHistoric_Dur_of_Events_IPSL-CM5A-MR.png
- plots/spi_collect/spi_collect/SPI_mapHistoric_Dur_of_Events_MPI-ESM-MR.png
- plots/spi_collect/spi_collect/SPI_mapHistoric_No_of_Events_per_year_Observations.png
- plots/spi_collect/spi_collect/SPI_mapHistoric_Sev_index_of_Events_GFDL-ESM2G.png
- plots/spi_collect/spi_collect/SPI_mapObservations_Average_SPI_of_Events_Mean.png
- plots/spi_collect2/spi_collect2/SPI_mapFuture_Avr_SPI_of_Events_GISS-E2-H.png
- plots/spi_collect2/spi_collect2/SPI_mapFuture_Avr_SPI_of_Events_IPSL-CM5B-LR.png
- plots/spi_collect2/spi_collect2/SPI_mapFuture_Dur_of_Events_CNRM-CM5.png
- plots/spi_collect2/spi_collect2/SPI_mapFuture_Dur_of_Events_GFDL-ESM2G.png
- plots/spi_collect2/spi_collect2/SPI_mapFuture_Dur_of_Events_HadGEM2-CC.png
- plots/spi_collect2/spi_collect2/SPI_mapFuture_Dur_of_Events_IPSL-CM5A-LR.png
- plots/spi_collect2/spi_collect2/SPI_mapFuture_Dur_of_Events_MPI-ESM-MR.png
- plots/spi_collect2/spi_collect2/SPI_mapFuture_Dur_of_Events_MRI-ESM1.png
- plots/spi_collect2/spi_collect2/SPI_mapFuture_Sev_index_of_Events_GFDL-ESM2G.png
- plots/spi_collect2/spi_collect2/SPI_mapFuture_Sev_index_of_Events_IPSL-CM5B-LR.png
- plots/spi_collect2/spi_collect2/SPI_mapFuture_Sev_index_of_Events_MRI-ESM1.png
- plots/spi_collect2/spi_collect2/SPI_mapHistoric_Dur_of_Events_GFDL-ESM2G.png
- plots/spi_collect2/spi_collect2/SPI_mapHistoric_Dur_of_Events_GISS-E2-H.png
- plots/spi_collect2/spi_collect2/SPI_mapHistoric_Sev_index_of_Events_GFDL-CM3.png
- plots/spi_collect2/spi_collect2/SPI_mapHistoric_Sev_index_of_Events_MRI-ESM1.png
- plots/spi_collect2/spi_collect2/SPI_mapHistoric_Sev_index_of_Events_NorESM1-M.png
Thanks for checking!
Thanks! (And sorry that I didn't get the idea to look at the post before.) The differences are tiny deviation in the way missing data are masked on the plot (really only visible, if switch between the two versions of these figures.)
autoassess, validation, and all @ledm 's oceans eleven look fine! Stellar work @bouweandela :beer:
Thanks, everyone! The release has now been published!
It's a Christmas miracle 🎄 🎅
Recipe test results for v2.10
Here is an overview of the tests done for releasing v2.10. The results are available in https://esmvaltool.dkrz.de/shared/esmvaltool/v2.10.0/debug.html.
Here is the conda environment.yml.
Recipe running session 2023-12-12
Recipes that ran successfully (133 out of 155)
Recipes that failed because the diagnostic script failed (4 out of 155)
Recipes that failed because of missing data (4 out of 155)
Recipes that failed because the run took too long (8 out of 155)
Recipes that failed because they used too much memory (4 out of 155)
Recipes that failed because of an HDF5 error (3 out of 155)