Closed chrisbillowsMO closed 5 months ago
Hi @ESMValGroup/technical-lead-development-team @bouweandela @valeriupredoi
Any comments on the following evaluation please? (The original output from running the recipes for the first time is above).
The following are R recipes with various errors. Would anyone with R knowledge please take a look?
The errors were either of the below:
Error in (models_dataset == reference_dataset) && (models_exp == reference_exp) :
'length = 2' in coercion to 'logical(1)'
^ Operator >remapcon2< not found!
We have the capacity to address these errors - should we? Or does anyone already know how to solve these?
KeyError: 'Provenance record for /scratch/b/b382148/esmvaltool_output/recipe_martin18grl_20240515_142625/plots/spi_collect/spi_collect/SPI_time_series_Bremen_Observations.png already exists.'
iris.exceptions.ConcatenateError: failed to concatenate into a single cube.
Cube metadata differs for phenomenon: precipitation_flux
TypeError: unhashable type: 'CubeAttrsDict'
There is one NCL recipe with an error. Would anyone with NCL knowledge please take a look?
INFO fatal: in uajet_sh850, cannot read plev and latrange
We recognise recipe_check_obs.yml
is a known broken recipe but should we open a new issue to resolve the missing data issues with ESMValGroup/obs-maintainers
?
We've increased the time on all of these except for recipe_ipccwg1ar6ch3_fig_3_42_a.yml
which was already at the maximum time. Is there anything we can do about this?
We also had to increase time on these from the "Recipes that failed of other reasons or are still running" section.
This three are all the same as in v2.10 recipe test results
This is a new entry.
ValueError: Chunks and shape must be of the same length/dimension. Got chunks=(), shape=(1,)
great summary and work @chrisbillowsMO and @ehogan :beer:
Here is the issue with those three HDF5-related failures, as posted by @bouweandela back in December last year, when they were working on the 2.10 release: https://github.com/ESMValGroup/ESMValTool/issues/3463#issuecomment-1857587917
This is a HDF5 thread unsafe-related issue and it is flaky but it appears it is mostly reproducible (positive flakiness, or was it negative? don't matter). This has to be fixed, most probably by adding a file lock()
statement somewhere; I'll have a look myself, but don't set it as roadblock towards the release IMO
This Julia recipe has the following error:
recipe_rainfarm.yml
ERROR: LoadError: ArgumentError: Package YAML [ddb6d928-2868-570f-bddf-ab3f9cf99eb6] is required but does not seem to be installed:
Did you install the Julia dependencies?
fairly sure no is the answer to that q, bud :grin:
This Julia recipe has the following error: recipe_rainfarm.yml ERROR: LoadError: ArgumentError: Package YAML [ddb6d928-2868-570f-bddf-ab3f9cf99eb6] is required but does not seem to be installed:
Did you install the Julia dependencies?
No, I had missed the esmvaltool install Julia
step. Both Julia recipes now succeed, so I will update the first and second comments to reflect this 👍
10. Recipes that never ran
* recipe_schlund20jgr_gpp_abs_rcp85.yml * recipe_schlund20jgr_gpp_change_1pct.yml * recipe_schlund20jgr_gpp_change_rcp85.yml
These have been excluded from the
generate.py
script. @schlunma might you need to run these?
Successfully tested them 👍 I'll update the comment above to reflect this.
5. Recipes that failed because the run took too long
- recipe_climate_change_hotspot.yml
- recipe_eyring06jgr.yml
- recipe_eyring13jgr_12.yml
- recipe_ipccwg1ar6ch3_fig_3_19.yml
- recipe_ipccwg1ar6ch3_fig_3_42_a.yml
- recipe_ipccwg1ar6ch3_fig_3_42_b.yml
- recipe_lauer22jclim_fig5_lifrac.yml
We've increased the time on all of these except for
recipe_ipccwg1ar6ch3_fig_3_42_a.yml
which was already at the maximum time. Is there anything we can do about this?
- recipe_carvalhais14nat.yml
- recipe_lauer22jclim_fig9-11ab_scatter.yml
We also had to increase time on these from the "Recipes that failed of other reasons or are still running" section.
The following recipes are now running successfully, so I will update the comments above:
2024-05-16 13:42:09,525 UTC [170675] INFO Time for running the recipe was: 4:20:19.772793
2024-05-16 13:42:10,337 UTC [170675] INFO Maximum memory used (estimate): 50.4 GB
[...]
2024-05-16 13:42:12,725 UTC [170675] INFO Run was successful
2024-05-16 14:24:00,524 UTC [88405] INFO Time for running the recipe was: 4:58:26.498892
2024-05-16 14:24:01,288 UTC [88405] INFO Maximum memory used (estimate): 97.0 GB
[...]
2024-05-16 14:24:01,415 UTC [88405] INFO Run was successful
2024-05-16 13:57:25,039 UTC [76122] INFO Time for running the recipe was: 4:32:29.955802
2024-05-16 13:57:25,700 UTC [76122] INFO Maximum memory used (estimate): 225.9 GB
[...]
2024-05-16 13:57:27,644 UTC [76122] INFO Run was successful
Should I update the time for these recipes in SPECIAL_RECIPES in generate.py?
What should we do with the recipes that don't run within 8 hours?
6. Recipes that failed because they used too much memory
- recipe_model_evaluation_basics.yml
We've increased the memory on this one.
The following recipe is now running successfully, so I will update the comments above:
2024-05-16 09:28:34,122 UTC [86954] INFO Time for running the recipe was: 0:01:42.672771
2024-05-16 09:28:34,977 UTC [86954] INFO Maximum memory used (estimate): 73.2 GB
[...]
2024-05-16 09:28:35,092 UTC [86954] INFO Run was successful
This is a new recipe since ESMValTool v2.10.0, so it will need adding to SPECIAL_RECIPES in generate.py.
@bouweandela, @valeriupredoi, would it be possible to get some guidance on what to do now, please? How many of the failures above must we fix before moving onto the ESMValTool freeze and testing stages? Can all the diagnostic and data issues wait until ESMValTool testing? 🤔
Super work, guys! Here's me 3 cents (2 cents adjusted for inflation):
A possible reason for some of these failures could be iris' new attribute handling: since version 3.8, iris now distinguishes between local and global attributes. We adopted this new behavior in https://github.com/ESMValGroup/ESMValCore/pull/2398.
This was the reason for the errors in recipe_schlund20esd.yml
(fixed in https://github.com/ESMValGroup/ESMValTool/pull/3605) and recipe_wenzel16jclim.yml
(fixed in https://github.com/ESMValGroup/ESMValTool/pull/3603).
Super work, guys! Here's me 3 cents (2 cents adjusted for inflation):
- Julia example recipe is in the broken recipes list because the plot it produces is rubbish, see [Julia] Use NCDatasets instead of netCDF - masked values are treated as masked only in NCDatasets ESMValTool#3476
Apologies @valeriupredoi, you did say this previously, and I promptly forgot! I will update the comment above appropriately 👍
Not a worry, Emma, release time is a very busy one 🙂
@bouweandela, @valeriupredoi, would it be possible to get some guidance on what to do now, please? How many of the failures above must we fix before moving onto the ESMValTool freeze and testing stages? Can all the diagnostic and data issues wait until ESMValTool testing? 🤔
If you suspect it is an ESMValCore issue, I would recommend fixing it before moving on to testing ESMValTool, but otherwise you should be fine to move on.
Should I update the time for these recipes in SPECIAL_RECIPES in generate.py?
Yes, that would be helpful for the next release manager.
What should we do with the recipes that don't run within 8 hours?
Are these recipes still running after 8 hours? In my experience, sometimes processes get killed without SLURM telling you. If there are no more log messages in the debug log or diagnostic scripts logs long before the 8 hours are over, it seems likely that the process has silently crashed. If this is the case, you could try reducing the number of workers used by Dask. This can be done by configuring the distributed scheduler, or if there are non-lazy preprocessor functions #674 in the recipe, you can use the default scheduler and create a file called ~/.config/dask/dask.yml
and put
num_workers: 16
in it. That will use just 16 threads instead of the default 128 on a default levante compute node, leaving 256GB/16 = 16GB of RAM per thread instead of just 2GB.
Closing this issue in favour of #2468 😊
Recipe test results for v2.11.0rc1
This is the initial output from testing done for releasing ESMValCore v2.11.0rc1. Please see the following comment for our evaluation of the failures.
Recipe running session 2024-05-15
Setup
mamba
versionESMValTool version
Recipes that ran successfully (132 out of 160)
Click to expand
- recipe_albedolandcover.yml - recipe_anav13jclim.yml - recipe_arctic_ocean.yml - recipe_autoassess_landsurface_permafrost.yml - recipe_autoassess_landsurface_soilmoisture.yml - recipe_autoassess_landsurface_surfrad.yml - recipe_autoassess_stratosphere.yml - recipe_bock20jgr_fig_1-4.yml - recipe_bock20jgr_fig_6-7.yml - recipe_capacity_factor.yml - recipe_climate_change_hotspot.yml - recipe_climwip_brunner2019_med.yml - recipe_climwip_brunner20esd.yml - recipe_climwip_test_basic.yml - recipe_climwip_test_performance_sigma.yml - recipe_clouds_bias.yml - recipe_clouds_ipcc.yml - recipe_cmug_h2o.yml - recipe_concatenate_exps.yml - recipe_consecdrydays.yml - recipe_correlation.yml - recipe_cox18nature.yml - recipe_cvdp.yml - recipe_daily_era5.yml - recipe_deangelis15nat.yml - recipe_deangelis15nat_fig1_fast.yml - recipe_decadal.yml - recipe_diurnal_temperature_index.yml - recipe_eady_growth_rate.yml - recipe_ecs.yml - recipe_ecs_constraints.yml - recipe_ecs_scatter.yml - recipe_ensclus.yml - recipe_era5-land.yml - recipe_esacci_lst.yml - recipe_esacci_oc.yml - recipe_extract_shape.yml - recipe_extreme_index.yml - recipe_eyring06jgr.yml - recipe_flato13ipcc_figure_914.yml - recipe_flato13ipcc_figure_924.yml - recipe_flato13ipcc_figure_942.yml - recipe_flato13ipcc_figure_945a.yml - recipe_flato13ipcc_figure_96.yml - recipe_flato13ipcc_figure_98.yml - recipe_flato13ipcc_figures_926_927.yml - recipe_flato13ipcc_figures_92_95.yml - recipe_flato13ipcc_figures_938_941_cmip3.yml - recipe_flato13ipcc_figures_938_941_cmip6.yml - recipe_galytska23jgr.yml - recipe_gier2020bg.yml - recipe_globwat.yml - recipe_heatwaves_coldwaves.yml - recipe_hydro_forcing.yml - recipe_hype.yml - recipe_iht_toa.yml - recipe_impact.yml - recipe_ipccwg1ar6ch3_fig_3_42_b.yml - recipe_ipccwg1ar6ch3_fig_3_43.yml - recipe_ipccwg1ar6ch3_fig_3_9.yml - recipe_kcs.yml - recipe_landcover.yml - recipe_lauer13jclim.yml - recipe_lauer22jclim_fig1_clim.yml - recipe_lauer22jclim_fig1_clim_amip.yml - recipe_lauer22jclim_fig2_taylor.yml - recipe_lauer22jclim_fig2_taylor_amip.yml - recipe_lauer22jclim_fig6_interannual.yml - recipe_lauer22jclim_fig7_seas.yml - recipe_lauer22jclim_fig8_dyn.yml - recipe_lauer22jclim_fig9-11c_pdf.yml - recipe_li17natcc.yml - recipe_lisflood.yml - recipe_marrmot.yml - recipe_meehl20sciadv.yml - recipe_model_evaluation_basics.yml - recipe_model_evaluation_clouds_clim.yml - recipe_model_evaluation_clouds_cycles.yml - recipe_model_evaluation_precip_zonal.yml - recipe_modes_of_variability.yml - recipe_monitor.yml - recipe_monitor_with_refs.yml - recipe_mpqb_xch4.yml - recipe_multimodel_products.yml - recipe_my_personal_diagnostic.yml - recipe_ncl.yml - recipe_ocean_Landschuetzer2016.yml - recipe_ocean_amoc.yml - recipe_ocean_bgc.yml - recipe_ocean_example.yml - recipe_ocean_ice_extent.yml - recipe_ocean_multimap.yml - recipe_ocean_scalar_fields.yml - recipe_perfmetrics_CMIP5.yml - recipe_perfmetrics_CMIP5_4cds.yml - recipe_perfmetrics_land_CMIP5.yml - recipe_preprocessor_test.yml - recipe_psyplot.yml - recipe_pv_capacity_factor.yml - recipe_python.yml - recipe_python_for_CI.yml - recipe_quantilebias.yml - recipe_r.yml - recipe_radiation_budget.yml - recipe_rainfarm.yml - recipe_runoff_et.yml - recipe_russell18jgr.yml - recipe_schlund20jgr_gpp_abs_rcp85.yml - recipe_schlund20jgr_gpp_change_1pct.yml - recipe_schlund20jgr_gpp_change_rcp85.yml - recipe_sea_surface_salinity.yml - recipe_seaborn.yml - recipe_seaice.yml - recipe_seaice_drift.yml - recipe_seaice_feedback.yml - recipe_shapeselect.yml - recipe_smpi.yml - recipe_smpi_4cds.yml - recipe_snowalbedo.yml - recipe_spei.yml - recipe_tcr.yml - recipe_thermodyn_diagtool.yml - recipe_toymodel.yml - recipe_validation.yml - recipe_validation_CMIP6.yml - recipe_variable_groups.yml - recipe_weigel21gmd_figures_13_16.yml - recipe_wenzel14jgr.yml - recipe_wenzel16nat.yml - recipe_wflow.yml - recipe_williams09climdyn_CREM.yml - recipe_zmnam.ymlRecipes that failed because the diagnostic script failed (11 out of 160)
Recipes that failed because of missing data (3 out of 160)
Recipes that failed because the run took too long (6 out of 160)
Recipes that failed of other reasons or are still running (7 out of 160)
Recipes that are known to be broken (1 out of 160)