valeriupredoi commented 1 year ago

Sister and logical evolution of #2852 - I am commencing testing and comparison of recipes and recipes results in order to release 2.7.0 at the end of this week (hopefully). System parameters below, work done on DKRZ/Levante: submit files in /home/b/b382109/submit, output in /scratch/b/b382109/esmvaltool_output

System and settings

`conda`/`mamba`

(base) mamba --version
mamba 0.27.0
conda 22.9.0

Git branch and state

Date: 25 October 2022 14:22 BST

(base) git status
On branch release_270stable
Your branch is up to date with 'origin/release_270stable'.

nothing to commit, working tree clean

Environment

On Levante:

mamba env create -n tool270Test -f environment.yml
conda activate tool270Test

Environment file

ToolEnv270Test.yml

Extraneous file movements

I moved the autoassess-specific files to /home/b/b382109/autoassess_files - run was succesful for AA recipes then :+1:

Ad-hoc hacks (code changes)

/home/b/b382109/ESMValTool/esmvaltool/diag_scripts/land_carbon_cycle/diag_global_turnover.py l.278 change .outline_patch with .spines["geo"] as suggested by @zklaus in https://github.com/ESMValGroup/ESMValTool/issues/2886#issuecomment-1292135500 (cheers, dude!) - this will have to be PR-ed

Mods to config user file

Added DKRZ downloaded data pool as:

  CMIP6:
    - /work/bd0854/DATA/ESMValTool2/CMIP6_DKRZ
    - /work/bd0854/DATA/ESMValTool2/download/CMIP6
  CMIP5:
    - /work/bd0854/DATA/ESMValTool2/CMIP5_DKRZ
    - /work/bd0854/b309141/additional_CMIP5
    - /work/bd0854/DATA/ESMValTool2/download/cmip5/output1
    - /work/bd0854/DATA/ESMValTool2/download/cmip5

as @schlunma and @remi-kazeroni have suggested :beer:

Recipe runs

Recipe runs results (as of final on 27 October 2022) are listed in https://github.com/ESMValGroup/ESMValTool/issues/2881#issuecomment-1291878142 (with very many thanks to @remi-kazeroni for running the impossible to run ones!) and are as follows:

122(121)*/127 successfully run recipes
0(1)*/127 failed with Diagnostic error, but fixed and rerun, but not yet PR-ed with the fix
2/127 that are missing data (for reals)
3/127 that have various issues (not missing data and not DiagnosticError)

(*) means not counting/counting the one that had a DiagnosticError but was fixed but not PR-ed

Running the comparison

Login and access to the DKRZ esmvaltool VM

Results from recipe runs are stored on the VM; login with:

ssh youraccount@esmvaltool.dkrz.de

Get and install miniconda on VM

E.g. scp Miniconda3-py39_4.12.0-Linux-x86_64.sh b382109@esmvaltool.dkrz.de:~ from a file already on Levante.

Setting up the input files

If you wrote recipe runs output to Levante /scratch partition be aware that the data will be removed after two weeks, so you will have to move the output data to the /work partition, via e.g. a nohup job:

nohup cp -r /scratch/b/b382109/esmvaltool_output/* /work/bd0854/b382109/v270

/work is visible by the VM so you can run the compare tool straight on the VM.

NOTE do not store final release results on the VM including /preproc/ dirs, the total size for all the recipes output, including /preproc/ dirs is in the 4.5TB ballpark, much too high for the VM storage capacity

Running compare tool at VM

run date: 28 October 2022 (1st run)
conda env: tool270Compare
ESMValTool branch: release270stable
prerquisite: pip install imagehash

Input/output/run

current: /work/bd0854/b382109/v270 (contains preproc/ dirs too, 122 recipes)
reference: /mnt/esmvaltool_disk2/shared/esmvaltool/v2.6.0rc4 (does not contain preproc/ dirs)
cmd: nohup python ESMValTool/esmvaltool/utils/testing/regression/compare.py /mnt/esmvaltool_disk2/shared/esmvaltool/v2.6.0rc4 /work/bd0854/b382109/v270 > compare270output.txt

Sanity check, as outputted by compare.py

Comparing recipe run(s) in:
/work/bd0854/b382109/v270
to reference in /mnt/esmvaltool_disk2/shared/esmvaltool/v2.6.0rc4

First pass result

Running the compare.py results in a few recipes not-OK (NOK) wrt plots differing from previous release v2.6.0, summary in https://github.com/ESMValGroup/ESMValTool/issues/2881#issuecomment-1294735465

Detailed plots inspection

Plots that differ for the 34 recipes that have them different is happening in https://github.com/ESMValGroup/ESMValTool/issues/2881#issuecomment-1295001054

valeriupredoi commented 1 year ago

@sloosvel I am in dire pain after realizing blithering DKRZ's SLURM emails me for every recipe :face_with_spiral_eyes:

valeriupredoi commented 1 year ago

@sloosvel what's these jobs up to?

(tool270Test) squeue -u b382109
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
           2378977   compute recipe_z  b382109 PD       0:00      1 (AssocMaxJobsLimit)
           2378976   compute recipe_w  b382109 PD       0:00      1 (AssocMaxJobsLimit)
           2378975   compute recipe_w  b382109 PD       0:00      1 (AssocMaxJobsLimit)
           2378974   compute recipe_w  b382109 PD       0:00      1 (AssocMaxJobsLimit)

sloosvel commented 1 year ago

@sloosvel I am in dire pain after realizing blithering DKRZ's SLURM emails me for every recipe face_with_spiral_eyes

You can comment that if it's not useful to you, to me it was!

@sloosvel what's these jobs up to?

I think there is a limit in number of jobs an account can run simultaneously in levante. They will be pending until other jobs finish I guess

remi-kazeroni commented 1 year ago

@sloosvel what's these jobs up to?

On Levante, a user can't have more than 20 Slurm jobs running at a time. As soon as a job is finished, the next one should start

valeriupredoi commented 1 year ago

They will be pending until other jobs finish I guess

Cheers! More emails then :man_facepalming: :rofl:

valeriupredoi commented 1 year ago

OK guys - first (and only) sbatch session over on Levante (I have one stray recipe still running, it's a zombie though) and this is how it looks:

Recipe running session 2022-10-26 13:13:41.568698

Succesfully run recipes

122 out of 127 final

recipe_anav13jclim.yml by @remi-kazeroni
recipe_albedolandcover.yml
recipe_arctic_ocean.yml
recipe_autoassess_landsurface_permafrost.yml
recipe_autoassess_landsurface_soilmoisture.yml
recipe_autoassess_landsurface_surfrad.yml
recipe_autoassess_radiation_rms_Amon_all.yml
recipe_autoassess_radiation_rms_Amon_obs.yml
recipe_autoassess_stratosphere.yml
recipe_bock20jgr_fig_1-4.yml by @remi-kazeroni
recipe_bock20jgr_fig_6-7.yml
recipe_bock20jgr_fig_8-10.yml
recipe_capacity_factor.yml
recipe_carvalhais14nat.yml
recipe_climwip_brunner2019_med.yml by @remi-kazeroni
recipe_climwip_brunner20esd.yml
recipe_climwip_test_basic.yml
recipe_climwip_test_performance_sigma.yml
recipe_clouds_bias.yml
recipe_clouds_ipcc.yml
recipe_cmug_h2o.yml
recipe_collins13ipcc.yml by @remi-kazeroni
recipe_combined_indices.yml
recipe_concatenate_exps.yml
recipe_consecdrydays.yml
recipe_correlation.yml
recipe_cox18nature.yml
recipe_cvdp.yml
recipe_daily_era5.yml
recipe_deangelis15nat.yml
recipe_deangelis15nat_fig1_fast.yml
recipe_decadal.yml
recipe_diurnal_temperature_index.yml
recipe_eady_growth_rate.yml
recipe_ecs.yml
recipe_ecs_constraints.yml
recipe_ecs_scatter.yml
recipe_ensclus.yml
recipe_era5-land.yml
recipe_esacci_lst.yml
recipe_esacci_oc.yml
recipe_extract_shape.yml
recipe_extreme_events.yml
recipe_extreme_index.yml
recipe_eyring06jgr.yml
recipe_eyring13jgr_12.yml
recipe_gier2020bg.yml
recipe_globwat.yml
recipe_heatwaves_coldwaves.yml
recipe_hydro_forcing.yml
recipe_hyint.yml
recipe_hyint_extreme_events.yml
recipe_hype.yml
recipe_impact.yml by @remi-kazeroni
recipe_ipccwg1ar6ch3_atmosphere.yml
recipe_julia.yml
recipe_kcs.yml
recipe_landcover.yml
recipe_lauer13jclim.yml
recipe_li17natcc.yml
recipe_lisflood.yml
recipe_marrmot.yml
recipe_martin18grl.yml
recipe_meehl20sciadv.yml
recipe_miles_block.yml
recipe_miles_eof.yml
recipe_miles_regimes.yml
recipe_modes_of_variability.yml
recipe_monitor.yml
recipe_monitor_with_refs.yml
recipe_mpqb_xch4.yml
recipe_multimodel_products.yml
recipe_my_personal_diagnostic.yml
recipe_ncl.yml
recipe_ocean_Landschuetzer2016.yml
recipe_ocean_amoc.yml
recipe_ocean_bgc.yml
recipe_ocean_example.yml
recipe_ocean_ice_extent.yml
recipe_ocean_multimap.yml
recipe_ocean_quadmap.yml
recipe_ocean_scalar_fields.yml
recipe_pcrglobwb.yml
recipe_preprocessor_derive_test.yml
recipe_preprocessor_test.yml
recipe_psyplot.yml
recipe_pv_capacity_factor.yml
recipe_python.yml
recipe_quantilebias.yml
recipe_perfmetrics_CMIP5.yml by @remi-kazeroni
recipe_perfmetrics_CMIP5_4cds.yml by @remi-kazeroni
recipe_r.yml
recipe_radiation_budget.yml
recipe_rainfarm.yml
recipe_runoff_et.yml
recipe_russell18jgr.yml
recipe_schlund20esd.yml
recipe_schlund20jgr_gpp_abs_rcp85.yml
recipe_schlund20jgr_gpp_change_1pct.yml
recipe_schlund20jgr_gpp_change_rcp85.yml
recipe_sea_surface_salinity.yml
recipe_seaice.yml by @remi-kazeroni
recipe_seaice_drift.yml
recipe_seaice_feedback.yml
recipe_shapeselect.yml
recipe_smpi.yml
recipe_smpi_4cds.yml
recipe_snowalbedo.yml
recipe_spei.yml
recipe_tcr.yml
recipe_tebaldi21esd.yml
recipe_thermodyn_diagtool.yml
recipe_toymodel.yml
recipe_validation.yml
recipe_validation_CMIP6.yml
recipe_variable_groups.yml
recipe_wenzel14jgr.yml
recipe_wenzel16jclim.yml
recipe_wenzel16nat.yml
recipe_wflow.yml
recipe_williams09climdyn_CREM.yml
recipe_zmnam.yml

Recipes that failed with DiagnosticError

0 out of 127 (1 fixed, not PR-ed yet)

recipe_carvalhais14nat.yml - https://github.com/ESMValGroup/ESMValTool/issues/2886 - @zklaus suggestion from https://github.com/ESMValGroup/ESMValTool/issues/2886#issuecomment-1292135500 fixes the problem!

Recipes that failed of Missing Data

2 out of 127 final

recipe_check_obs.yml - comment by @remi-kazeroni - It was missing one variable for MERRA2. I don't know why but I fixed that. If you rerun it, you will encounter some missing derived ERA5 data. See https://github.com/ESMValGroup/ESMValCore/issues/1388, we never took the time to fix that
recipe_climate_change_hotspot.yml

Recipes that failed of other reasons

3 out of 127 final

recipe_autoassess_radiation_rms_cfMon_all.yml - clisccp bugger handled by @alistairsellar https://github.com/ESMValGroup/ESMValCore/issues/1238
recipe_perfmetrics_land_CMIP5.yml (run by @remi-kazeroni ) known issue https://github.com/ESMValGroup/ESMValTool/issues/2594
recipe_flato13ipcc.yml - comment by @remi-kazeroni and confirmed by @katjaweigel - I think this is not runnable at the moment but should be fixed by @katjaweigel in #2156 (good luck, Katja!)

Obsolete/resolved issues comment:

The Julia ones are totally my bad - forgot to install Julia after installing esmvaltool, the autoassess ones are either of the old bug that @alistairsellar is fixing now, or they need aux data that is only on JASMIN, the ones of Missing Data are bothering me badly - since I have turned on auto downloads but they are still missing data, what do you guys recommend doing about those? @sloosvel @remi-kazeroni @bouweandela ? I will post detailed postmortems for the ones that have failed for odd reasons below :+1:

valeriupredoi commented 1 year ago

Postmortem of failed recipes OTHER THAN Missing Data

Recipes that failed with DiagnosticError

0 out of 127 (1 fixed, not yet PR-ed)

recipe_carvalhais14nat.yml - https://github.com/ESMValGroup/ESMValTool/issues/2886 - @zklaus suggestion from https://github.com/ESMValGroup/ESMValTool/issues/2886#issuecomment-1292135500 fixes the problem!

Recipes that failed of other reasons or are still running

1 out of 127

recipe_autoassess_radiation_rms_cfMon_all.yml - clisccp bugger handled by @alistairsellar https://github.com/ESMValGroup/ESMValCore/issues/1238

remi-kazeroni commented 1 year ago

Hi @valeriupredoi, great job with the testing! I forgot to mention but we have a central pool of downloaded data on Levante at /work/bd0854/DATA/ESMValTool2/download/CMIP6, /work/bd0854/DATA/ESMValTool2/download/cmip5/output1, and /work/bd0854/DATA/ESMValTool2/download/cmip5/output1. Maybe you could add those to your path on top of your download directory? This should help solving the time limit issues (lots of fx files searched on ESGF and/or downloaded I guess).

remi-kazeroni commented 1 year ago

recipe_smpi.yml - too slow Elapsed time : 04:00:19 (Timelimit=04:00:00)

For this one, I would recommend using:

#SBATCH --partition=compute
#SBATCH --time=08:00:00
#SBATCH --constraint=512G

valeriupredoi commented 1 year ago

Indeed, cheers @remi-kazeroni - smpi is a memory gobbler - I restarted it on SLURM and promptly got kicked out coz mem limit (this time around I think all data has been downloaded, hence it went to intensive processing). I'll resubmit with mem reqs. What do you recommend about those that really-really are missing data?

valeriupredoi commented 1 year ago

recipe_smpi.yml - too slow Elapsed time : 04:00:19 (Timelimit=04:00:00)

For this one, I would recommend using:
#SBATCH --partition=compute
#SBATCH --time=08:00:00
#SBATCH --constraint=512G

even with 512G still fails out of MEM :open_mouth:

valeriupredoi commented 1 year ago

oh crap, forgot to change the partition :face_in_clouds:

remi-kazeroni commented 1 year ago

recipe_smpi.yml - too slow Elapsed time : 04:00:19 (Timelimit=04:00:00)

For this one, I would recommend using:
#SBATCH --partition=compute
#SBATCH --time=08:00:00
#SBATCH --constraint=512G
even with 512G still fails out of MEM 😮

You can try with 1024G then! But that's the highest available

valeriupredoi commented 1 year ago

recipe_smpi.yml - too slow Elapsed time : 04:00:19 (Timelimit=04:00:00)

For this one, I would recommend using:
#SBATCH --partition=compute
#SBATCH --time=08:00:00
#SBATCH --constraint=512G
even with 512G still fails out of MEM open_mouth
You can try with 1024G then! But that's the highest available

totally user-side - forgot to change the partition to compute - cheers, dude! :beer:

sloosvel commented 1 year ago

I never managed to run the smpi recipes, @remi-kazeroni did it for me in the last release. Maybe the batch script settings for this recipe can be changed in #2883

valeriupredoi commented 1 year ago

with correct SLURM settings as recommended by @remi-kazeroni (:beer:) those smpi monsters are happily plodding along now - yes, we should change the settings for sure. @sloosvel how did you fix the runs for those recipes that really-really dont have data, like I found in https://github.com/ESMValGroup/ESMValTool/issues/2881#issuecomment-1291878142

remi-kazeroni commented 1 year ago

I don't have a definitive answer for the really-really missing data cases. As said in this comment, you could try to rerun the recipes adding these paths to you config file. But that data pool is 2 releases old. One could argue that we should delete it and re-download everything as /work/bd0854/DATA/ESMValTool2/download/ may contain data retracted from ESGF...

Taking a closer look at some of these (currently) 13 cases:

recipe_check_obs.yml (my favourite one!)-> It was missing one variable for MERRA2. I don't know why but I fixed that. If you rerun it, you will encounter some missing derived ERA5 data. See https://github.com/ESMValGroup/ESMValCore/issues/1388, we never took the time to fix that
recipe_anav13jclim.yml -> I think this is a special case that needs cmip5/output2 data. You could retry with /work/bd0854/DATA/ESMValTool2/download/cmip5/output2
recipe_climate_change_hotspot.yml -> Maybe @sloosvel and @pepcos could say more. I thought this issue was recently fixed...
recipe_flato13ipcc.yml -> I think this is not runnable at the moment but should be fixed by @katjaweigel in #2156.
recipe_meehl20sciadv.yml and recipe_schlund20esd.yml -> Maybe the recipe maintainer @schlunma could take a look 🍺
recipeperfmetrics*yml -> looks like the same datasets are missing...

sloosvel commented 1 year ago

I think for recipe_climate_change_hotspot.ym, I ended up running it on jasmin

valeriupredoi commented 1 year ago

Hi @remi-kazeroni @sloosvel awesome, thanks a lot! Here's the thing(s):

recipe_anav13jclim.yml - this is not optimal if "special" cmip5 data is needed, that is not available on ESGF - I would add this recipe to the list of those we have to see what to do about it wrt obsolete data
recipe_climate_change_hotspot.yml - same as above, unless there is a serious reason why it's not working, having to have preferred sites where recipes run is against our core principle of reproducibility of results

I'll have a closer look at the meeh and schnlund ones, and will ping @schlunma asap

katjaweigel commented 1 year ago

Yes, the version of recipe_flato13ipcc.yml currently in #2156 is running. The cost is to remove/comment out data sets, which do not work on Levante (and to fix a wrong time period for one model). There was already some discussion on how to deal with such cases, and if I remember right @axel-lauer , who is maintainer of the original recipe_flato13ipcc.yml did not agree on removing data sets? It should also be noted, that the option --skip_nonexistent does not work for all diagnostics in recipe_flato13ipcc.yml, because in several data sets from e.g. two different experiments are needed and it does not work, if only one is there. Therefore I was going to ask, which version of recipe_flato13ipcc.yml should be in the end in #2156 in this issue. (Unfortunately I'm also not completely ready with some issues in recipe_flato13ipcc_figures_938_941.yml I hope to finish them soon).

schlunma commented 1 year ago

V, can adapt the permission to /scratch/b/b382109/esmvaltool_output so I can have a look at the logs?

valeriupredoi commented 1 year ago

/scratch/b/b382109/esmvaltool_output

@schlunma Manu, they are here /home/b/b382109/manu_logs

valeriupredoi commented 1 year ago

Yes, the version of recipe_flato13ipcc.yml currently in #2156 is running. The cost is to remove/comment out data sets, which do not work on Levante (and to fix a wrong time period for one model). There was already some discussion on how to deal with such cases, and if I remember right @axel-lauer , who is maintainer of the original recipe_flato13ipcc.yml did not agree on removing data sets? It should also be noted, that the option --skip_nonexistent does not work for all diagnostics in recipe_flato13ipcc.yml, because in several data sets from e.g. two different experiments are needed and it does not work, if only one is there. Therefore I was going to ask, which version of recipe_flato13ipcc.yml should be in the end in #2156 in this issue. (Unfortunately I'm also not completely ready with some issues in recipe_flato13ipcc_figures_938_941.yml I hope to finish them soon).

@katjaweigel many thanks for your clarification! I will consider this recipe at-risk for now, and will not faff about it until you guys fix it - not the first and not the last time we include not really fully working recipes in a release :grin:

schlunma commented 1 year ago

cd: permission denied: /home/b/b382109/manu_logs :cry:

valeriupredoi commented 1 year ago

cd: permission denied: /home/b/b382109/manu_logs cry

bugger! :face_exhaling: Here they are, bud

meeh_log.txt schlund20esd_log.txt

schlunma commented 1 year ago

I just manually searched for the files on DKRZ's ESGF node and found all files. Not sure what's going on there, but as @remi-kazeroni I would recommend adding our shared pool (/work/bd0854/DATA/ESMValTool2/download/CMIP6) to your config-user.yml file :+1:

schlunma commented 1 year ago

* recipe_anav13jclim.yml - this is not optimal if "special" cmip5 data is needed, that is not available on ESGF - I would add this recipe to the list of those we have to see what to do about it wrt obsolete data

It's not "special" CMIP5 data, it's rather that our DRS do not take output into account. See discussion here: https://github.com/ESMValGroup/ESMValTool/issues/2408#issuecomment-1049955903

valeriupredoi commented 1 year ago

I just manually searched for the files on DKRZ's ESGF node and found all files. Not sure what's going on there, but as @remi-kazeroni I would recommend adding our shared pool (/work/bd0854/DATA/ESMValTool2/download/CMIP6) to your config-user.yml file +1

cheers, bud! Added, firing those up now :rocket:

valeriupredoi commented 1 year ago

OK I added the extra paths and some of them recipes have started plodding along, still - a few doggedly refuse to run still complaining of missing data:

- recipe_anav13jclim.yml
- recipe_check_obs.yml
- recipe_climate_change_hotspot.yml
- recipe_climwip_brunner2019_med.yml
- recipe_collins13ipcc.yml
- recipe_seaice.yml

I'll have to see about running those on JASMIN

schlunma commented 1 year ago

recipe_anav13jclim.yml needs

CMIP5: /work/bd0854/DATA/ESMValTool2/download/cmip5

in addition to the default paths.

valeriupredoi commented 1 year ago

FFS man - what's this - we're gathering data like they're sheep on a field in Wales? What are we gonna do about this - suboptimal data storage to put it politely :angry:

remi-kazeroni commented 1 year ago

I'm rerunning all recipes listed as "Recipes that failed of Missing Data" in this comment and the 2 recipe_bock20jgrfig* listed in the section below but using previously downloaded data in /work/bd0854/DATA/ESMValTool2/download/. Recipe runs can be found in /scratch/b/b309192/esmvaltool_output. Current status:

Running successfully

recipe_anav13jclim.yml
recipe_bock20jgr_fig_1-4.yml
recipe_bock20jgr_fig_6-7.yml
recipe_climwip_brunner2019_med.yml
recipe_collins13ipcc.yml
recipe_impact.yml
recipe_meehl20sciadv.yml
recipe_perfmetrics_CMIP5.yml
recipe_perfmetrics_CMIP5_4cds.yml
recipe_seaice.yml
recipe_smpi.yml
recipe_smpi_4cds.ym

Failed recipes

recipe_climate_change_hotspot.yml -> missing data
recipe_perfmetrics_land_CMIP5.yml -> known issue https://github.com/ESMValGroup/ESMValTool/issues/2594

@valeriupredoi, feel free to grab the successful runs and put them in your directory.

EDIT: 4 more successes

valeriupredoi commented 1 year ago

Anav dies yet again even with that extra data source :man_facepalming: - JASMIN it is for these showstoppers, only JASMIN is slow like a snail :snail:

schlunma commented 1 year ago

This is just our default download directory on Levante for data that has not been provided by DKRZ directly, Remi mentioned that here.

And as mentioned in my previous comment, anav13 is a special case since data from output2 cannot be read with the default DRS (our fault, not CMIPs!). You could also try:

CMIP5: /work/bd0854/DATA/ESMValTool2/download/cmip5/output2

valeriupredoi commented 1 year ago

@remi-kazeroni that's brilliant! How you managed to get seaice to run is a true mystery, mine failed like 4 times in the past hour :laughing: - what's your path, bud?

remi-kazeroni commented 1 year ago

FFS man - what's this - we're gathering data like they're sheep on a field in Wales? What are we gonna do about this - suboptimal data storage to put it politely 😠

Yeah, I know this is not optimal. But this is the directory in which several developers working on Levante download automatically their data to avoid having too many copies of the same datasets... Maybe we should revisit that for the next release and not use previously downloaded data.

remi-kazeroni commented 1 year ago

@remi-kazeroni that's brilliant! How you managed to get seaice to run is a true mystery, mine failed like 4 times in the past hour 😆 - what's your path, bud?

For the data:

rootpath:
  CMIP6: [/work/ik1017/CMIP6/data/CMIP6, /work/bd0854/DATA/ESMValTool2/download/CMIP6]
  CMIP5: [/work/kd0956/CMIP5/data/cmip5/output1/, /work/bd0854/DATA/ESMValTool2/download/cmip5/output1, /work/bd0854/DATA/ESMValTool2/download/cmip5/output2]

For the runs: /scratch/b/b309192/esmvaltool_output. So you have the seaice in: /scratch/b/b309192/esmvaltool_output/recipe_seaice_20221026_142333

valeriupredoi commented 1 year ago

This is just our default download directory on Levante for data that has not been provided by DKRZ directly, Remi mentioned that here.

And as mentioned in my previous comment, anav13 is a special case since data from output2 cannot be read with the default DRS (our fault, not CMIPs!). You could also try:
CMIP5: /work/bd0854/DATA/ESMValTool2/download/cmip5/output2

We need to get DKRZ on board to organize/populate their data in their ESGF node - this is a hot mess as it is right now - you guys and me are scraping for data like mad. JASMIN has it much better organized, only problem is JASMIN is abysmally slow compared to Levante and lacking memory. If Levante and Jasmin made a baby, then that'd be perfect :grin:

valeriupredoi commented 1 year ago

very many thanks, chaps! I'll let those run (both on me and Remi's partitions) and am off home, tomorrow I'll pick up the results, with all these "missing data" in (or most of them) I'll be able to run the comparison tomorrow - we're on track still :train2:

valeriupredoi commented 1 year ago

@schlunma meeh ran fine with the extra Welsh sheep data, bud! Mee-ha! Off to make dinner 🍕

valeriupredoi commented 1 year ago

OK guys final count 122/127 recipes successfully run - we are legend :beer: Now, on to the comparison dread :grin:

valeriupredoi commented 1 year ago

@sloosvel after trying to make the comparison script work for a bit of a while (not the most straightforwardly code and with quite a few missed catches, @bouweandela - sorry) I have realized that your runs in /scratch/b/b381943/esmvaltool_output are all empty shells - dir structure is there, but no files at all - can you please tell me what's going on? ASAP, please

valeriupredoi commented 1 year ago

@sloosvel also you ran a whole lot of other recipes on top of the standard ESMValTool release ones there - is there anywhere else where you moved the 2.6.0 release output? (one of the things that tripped the compare script)

sloosvel commented 1 year ago

The outputs you need are in the esmvaltool VM: https://esmvaltool.dkrz.de/shared/esmvaltool/v2.6.0rc4/ . I ran the comparison tool in there because it's where outputs from other versions are. The other outputs are personal work, no need to compare them!

valeriupredoi commented 1 year ago

OK cool! How do I get access to those files via a terminal please - I need to run the compare tool via command line, or is there any other way to do that? :beer:

sloosvel commented 1 year ago

You can log in the machine using your levante credentials: ssh youraccount@esmvaltool.dkrz.de

First move your outputs in /shared/esmvaltool/ (I used rsync excluding the preproc folder for all outputs) and then run the comparison tool against the output for other versions

valeriupredoi commented 1 year ago

Thanks - I will do that now. But I am confused by the lack of standardization and disk backuping - I reckon this shouldn't be done for the next release, and the RM should keep the data on the actual Levante disk too eg I am looking at the output from https://esmvaltool.dkrz.de/shared/esmvaltool/v2.6.0/debug.html and see files listed eg /scratch/b/b381943/esmvaltool_output/recipe_autoassess_landsurface_permafrost_20220712_101605/preproc/aa_landsurf_permafrost/tas/CMIP6_ACCESS-CM2_Amon_historical_r1i1p1f1_tas_gn_1992-2002.nc - those don't exist, what happened to them? Are they behind a virtual OS layer?

valeriupredoi commented 1 year ago

OK this is suboptimal to the very least - I managed to get into the VM but it's barren - I need to create a conda env, and depending on the deps (hopefully not many have changed since two days ago when I created the testing env), we may get different results based on what deps the compare script ingests - I may have to use a condalock for that from the actual Levante env, let alone very heavy data duplication (those outputs even without the preproc/ dirs, which I didn't output anyway, are not small). Why didn't you keep the output on Levante, or run there in the first place?

sloosvel commented 1 year ago

Because I don't have the output for other versions in Levante. Other releases were ran in Mistral. The outputs are in the virtual machine. And it's indicated in the documentation anyway: https://docs.esmvaltool.org/en/latest/utils.html#comparing-recipe-runs

valeriupredoi commented 1 year ago

and on top of this all I don't have write permissions to /shared/esmvaltool to move the data - I really don't want to move it in my $HOME on the VM first and then move them back or symlink them; @remi-kazeroni who is supposed to give me rwx+ to that partition please?

ESMValGroup / ESMValTool

Recipe testing and comparison for release 2.7.0 #2881

System and settings

`conda`/`mamba`

Git branch and state

Environment

Environment file

Extraneous file movements

Ad-hoc hacks (code changes)

Mods to config user file

Recipe runs

Running the comparison

Login and access to the DKRZ esmvaltool VM

Get and install miniconda on VM

Setting up the input files

Running compare tool at VM

Input/output/run

First pass result

Detailed plots inspection

Recipe running session 2022-10-26 13:13:41.568698

Succesfully run recipes

Recipes that failed with DiagnosticError

Recipes that failed of Missing Data

Recipes that failed of other reasons

Postmortem of failed recipes OTHER THAN Missing Data

Recipes that failed with DiagnosticError

Recipes that failed of other reasons or are still running

Running successfully

Failed recipes

ESMValGroup / ESMValTool

Recipe testing and comparison for release 2.7.0 #2881

System and settings

conda/mamba

Git branch and state

Environment

Environment file

Extraneous file movements

Ad-hoc hacks (code changes)

Mods to config user file

Recipe runs

Running the comparison

Login and access to the DKRZ esmvaltool VM

Get and install miniconda on VM

Setting up the input files

Running compare tool at VM

Input/output/run

First pass result

Detailed plots inspection

Recipe running session 2022-10-26 13:13:41.568698

Succesfully run recipes

Recipes that failed with DiagnosticError

Recipes that failed of Missing Data

Recipes that failed of other reasons

Postmortem of failed recipes OTHER THAN Missing Data

Recipes that failed with DiagnosticError

Recipes that failed of other reasons or are still running

Running successfully

Failed recipes

`conda`/`mamba`