Closed valeriupredoi closed 1 year ago
Because I don't have the output for other versions in Levante. Other releases were ran in Mistral. The outputs are in the virtual machine. And it's indicated in the documentation anyway: https://docs.esmvaltool.org/en/latest/utils.html#comparing-recipe-runs
yes but you ran them on Levante and not on the VM, like I did myself - not saying you did wrong, am saying this is a mess including the instructions in the docs - moving data is bad, removing data is even worse
and on top of this all I don't have write permissions to /shared/esmvaltool to move the data
You should be able to sudo rsync
or any other command
sudo rsync
will not work from Levante since I am not in the list of sudoers. This fails:
rsync -avzh /scratch/b/b382109/esmvaltool_output/ b382109@esmvaltool.dkrz.de:/shared/esmvaltool/
I need write permissions to the /shared/esmvaltool/
partition. While I'm waiting for it, I'll rsync
all that in my $HOME (silly, silly stuff)
@valeriupredoi Rémi is on vacation. I don't have access to the machine altogether (Connection closed by 136.172.60.95 port 22
), so I cannot help you here. If I remember correctly, @bouweandela also had access to it, maybe he can give you the rights?
Cheers Manu, not a worry just yet, fingers crossed I have enough disk quota so I can rsync all the results in my home on the VM, then I can run the comparison. Heads up though, to myself, should not rm the data from Levante so next RM can run/rerun comparisons, and only when all's good and set move it to the VM, keeping a copy on Levante 👍
OK it proves I really need rwx
permissions to /shared/esmvaltool
- the data volume for them results is in the ballpark of 30G - my rsync stopped and was not aware of it, was happy to see it's only 6-7G but the transfer it's totally incomplete. Even so, the limit on user's home on the VM looks to be 10G - bit of a tiny one given I needs miniconda guff too. Unless someone is a saint and moves my output from Levante to the VM shared dir, am stuck at this step, w/o being able to run the comparison - unless, of course, I rsync the 2.6.0 results back to Levante - which, as silly as it may sound, I might do if write access proves to be a lengthy process
I would really recommend to go through this process (copying the recipe output to the VM, generating the debug.html page and running the comparison tool) with the previous RM or someone available to help. I reckon this is not well documented (partly for security issues since the VM shouldn't be publicly accessed) and it would be helpful to schedule a quick call with a previous RM who has worked with the VM. @sloosvel, would you be able to help V with these tasks? I'm quite unsure how the recipe output should be copied and the comparison tool ran. (And I'm out of office these days 😬 )
I agree there is room for discussion and improvement in the way with test recipes: is it good practice to use data downloaded long ago? How do we handle the preproc files which don't fit in the disk of the VM (1TB limit...)? Where to store "official" (release) output? I believe this will be partly addressed by the RTW and this discussion could take place for the next release. Given that 122/127 recipes run successfully and no new bug left unfixed, I think this release is on very good track! Well done @valeriupredoi! 🍻
@remi-kazeroni you're a legend writing work emails while on holidays! Thanks a lot, dude, I have now all manners of access - write access to /shared/esmvaltool
, am transferring all output from /scratch
to /work
(under nohup
, so I guess it'll do it overnight), so then tomorrow I can compare as much as I please :grin: And my sincere apologies for barking at poor @sloosvel for what I thought her removing the output data - proves out DKRZ is sneakily rm-ing it from /scratch
after 2 weeks. Here's the thing though - this badly needs documenting - I mean the whole process, w/o actual login/specific machine details for privacy reasons, but the workflow is complicated enough to be written down; I'll try do it as much as possible - @sloosvel you should have done this after 2.6.0 with help from DKRZ tenants like @remi-kazeroni or @schlunma (apols if you did and I missed the instructions, am pretty good at not reading instructions). Provided all goes well with the copy to /work I will run compare tomorrow, may have to bug you for more info on debug html stuffs - but many thanks you all @remi-kazeroni @sloosvel @schlunma @bouweandela for the help so far :beer:
Hi @valeriupredoi please let me know if want to schedule a call, I have to say that I am quite confused by all your issues. I did not ran into any of that.
Hi @sloosvel - many thanks, am back on track now, no need for a call just yet, maybe if you could keep an eye on this issue if I ask for some help, that'd be awesome :beer: :+1:
OK comparison tool is now plodding along nicely - I have also added the instructions in the issue description - we can use that description to hatch us a nice doc entry - the next RM should not go through the Gates of VM Purgatory like I did yesterday :+1:
/work/bd0854/b382109/v270
(contains preproc/
dirs too, 122 recipes)/mnt/esmvaltool_disk2/shared/esmvaltool/v2.6.0rc4
(does not contain preproc/
dirs)nohup python ESMValTool/esmvaltool/utils/testing/regression/compare.py /mnt/esmvaltool_disk2/shared/esmvaltool/v2.6.0rc4 /work/bd0854/b382109/v270 > compare270output.txt
/home/b/b382109/compare270output.txt
Legend:
122 out of 127 final
We need to look at plots for 34 recipes; we're good to go for 85 recipes; 3 have no reference in 2.6.0
And as mentioned in my previous comment, anav13 is a special case since data from output2 cannot be read with the default DRS (our fault, not CMIPs!). You could also try:
CMIP5: /work/bd0854/DATA/ESMValTool2/download/cmip5/output2
@schlunma Why are the project
and product
facets hardcoded in the path? This should work fine if the DRS starts with {project.lower}/{product}/..
(i.e. the one called ESGF
in config-developer.yml) and the rootpath is set to /work/bd0854/DATA/ESMValTool2/download
.
Oh wait, is that because you're trying to combine data downloaded with the ESGF DRS with the DKRZ DRS and we don't support per path DRS settings? ESMValGroup/ESMValCore#129
Oh wait, is that because you're trying to combine data downloaded with the ESGF DRS with the DKRZ DRS and we don't support per path DRS settings? ESMValGroup/ESMValCore#129
Exactly. We found that using the full paths with project
and output
for the downloaded data is currently the cleanest way to include the DKRZ and ESGF rootpaths.
Actually, the cleanest way to make it work is just to set download_dir: /work/bd0854/DATA/ESMValTool2/download/
(provided you have write access there) and run with offline: false
and only use the 'official' rootpaths for the CMIP projects. This works because the tool will always check if it already has a file before downloading it. In this case you're lucky that the ESGF DRS is similar to the DKRZ one, so your approach works too. Anyway, I'll see if I can do something about https://github.com/ESMValGroup/ESMValCore/issues/129 for the next release.
We tried that too, but if I remember correctly there was a problem with this. Could be that it was an issue because at the beginning downloading was very slow (so there was no chance that slow recipes would run), which should be fixed now. I will give it another try sometime in the future.
It would be really great if you could do something about https://github.com/ESMValGroup/ESMValCore/issues/129 :rocket:
alright folks, as we have seen in https://github.com/ESMValGroup/ESMValTool/issues/2881#issuecomment-1294735465 we need to look at some recipes that have not had the same plots as in v2.6.0, these are 34 party poopers:
magic_bsc/weather_regime.R
diag that uses Kmeans clustering so I expect variability, to what extent, beats me!
https://github.com/ESMValGroup/ESMValTool/issues/2890To quickly identify differing plots please have a look at this log https://esmvaltool.dkrz.de/shared/esmvaltool/compare270output_trimmed.txt
We can have a look at them in the run list for v2.7.0 https://esmvaltool.dkrz.de/shared/esmvaltool/v2.7.0/debug.html vs the v2.6.0 one https://esmvaltool.dkrz.de/shared/esmvaltool/v2.6.0/debug.html - I will start having me a look but by all means, @ESMValGroup/esmvaltool-developmentteam I could really use a hand here, especially since you (as recipe maintainer/developer) you know these things well, they're all beetles and bugs on coloured paper to me :grin:
Is there a log file where we can see the differences? My recipe has a lot of plots and if just one of them differs as the list says it'd be easier to just look at that one =D
Is there a log file where we can see the differences? My recipe has a lot of plots and if just one of them differs as the list says it'd be easier to just look at that one =D
logfile coming right away - I'll post it in the comment above :beer:
before I post the log (currently curating it) to not lose you, Tina, here's the only bitty plot that differs for your recipe: recipe_gier2020bg.yml: results differ from reference run Reference run: /mnt/esmvaltool_disk2/shared/esmvaltool/v2.6.0rc4/recipe_gier2020bg_20220712_100159 Current run: /work/bd0854/b382109/v270/recipe_gier2020bg_20221025_142445 Differing files:
You can pass that recipe, that's just a different sorting for the diff ensemble members in the histogram and looks diff cause they're not labeled. Cheers for the special extract for me ;)
You can pass that recipe, that's just a different sorting for the diff ensemble members in the histogram and looks diff cause they're not labeled. Cheers for the special extract for me ;)
@bettina-gier - legend, many thanks! :beer:
@valeriupredoi do you mind if I run recipe_climate_change_hotspot in jasmin, so that I can at least upload the results for this version?
@valeriupredoi do you mind if I run recipe_climate_change_hotspot in jasmin, so that I can at least upload the results for this version?
@sloosvel go for it! Cheers :beer: - but am planning on releasing tonight, it looks promising. If you run it then upload results to the v2.7.0 that'd be awesome, and the release is not affecting that :+1: It'd be great if you was around to approve the last PR thereby changing the version number, in an hour or so, no probs if you not will ask @bouweandela
@valeriupredoi for recipe_ocean_example
, the only obvious difference seems to be the transect plots, Diag_Transect_1
, Diag_Transect_2
and Diag_Transect_3
. These plots are empty in v2.6.0 and non-empty in v2.7.0 (see bugfix #2858).
Diag_Transect_1
picks up the mask data 1.e20, but this is probably a separate issue.
@TomasTorsvik brilliant, many thanks for looking! And a positive difference too, thanks to your PR :beer:
Diag_Transect_1 picks up the mask data 1.e20, but this is probably a separate issue.
Would you be OK to open an issue about this, please? And tag @ledm so we can fix that in 2.8. Many thanks! :beer:
@valeriupredoi the same applies for recipe_ocean_bgc
, the v2.7.0 have plots for Diag_Transect_No_Data
and Diag_Transect_vs_Woa
that are empty in v2.6.0. The other plots look OK to me.
Fantastic, cheers @TomasTorsvik :beer:
@valeriupredoi I'm not sure, but it seems the difference in recipe_ocean_ice_extent
can be connected with land/coastline boarders. See e.g.
@valeriupredoi I'm not sure, but it seems the difference in
recipe_ocean_ice_extent
can be connected with land/coastline boarders. See e.g.
Eagle-eyed, man :eagle: - coastline contours are more pronounced in 2.7 - cartopy change most probably, but they look the same to me!
OK this concludes the release testing marathon! Good news is there are not many bad apples among the recipes, bad news is there are a couple - see https://github.com/ESMValGroup/ESMValTool/issues/2881#issuecomment-1295001054 - we found a couple MAGICs project R recipes that look dubious and opened at least one issue about https://github.com/ESMValGroup/ESMValTool/issues/2890 - but since these recipes are unmaintained, developers who wrote them have in the meantime left the institutes they're listed under etc I am not going to hold the release for some Da Vinci Code-style tracking down; we need to think what we do with such recipes.
Oh and the ocean recipes by @ledm need some TLC but he's told me this for a while now, we should get together one time and fix them, no major bugs, but old crap that needs updating.
I declare this Tool ready for release! Many thanks to all who helped during this testing process @sloosvel @remi-kazeroni @schlunma @bettina-gier @TomasTorsvik and @bouweandela of course :grin: :beers:
it's out and about! :beer: https://pypi.org/project/ESMValTool/2.7.0/
You can pass that recipe, that's just a different sorting for the diff ensemble members in the histogram and looks diff cause they're not labeled. Cheers for the special extract for me ;)
@bettina-gier Would it be possible to sort the ensemble members in a way that is stable between runs? With the upcoming more regular recipe testing that @ehogan et al are working on, as described in #2723, issues like this will keep popping up.
Sister and logical evolution of #2852 - I am commencing testing and comparison of recipes and recipes results in order to release 2.7.0 at the end of this week (hopefully). System parameters below, work done on DKRZ/Levante: submit files in
/home/b/b382109/submit
, output in/scratch/b/b382109/esmvaltool_output
System and settings
conda
/mamba
Git branch and state
Date: 25 October 2022 14:22 BST
Environment
On Levante:
Environment file
ToolEnv270Test.yml
Extraneous file movements
I moved the autoassess-specific files to
/home/b/b382109/autoassess_files
- run was succesful for AA recipes then :+1:Ad-hoc hacks (code changes)
/home/b/b382109/ESMValTool/esmvaltool/diag_scripts/land_carbon_cycle/diag_global_turnover.py
l.278 change.outline_patch
with.spines["geo"]
as suggested by @zklaus in https://github.com/ESMValGroup/ESMValTool/issues/2886#issuecomment-1292135500 (cheers, dude!) - this will have to be PR-edMods to config user file
Added DKRZ downloaded data pool as:
as @schlunma and @remi-kazeroni have suggested :beer:
Recipe runs
Recipe runs results (as of final on 27 October 2022) are listed in https://github.com/ESMValGroup/ESMValTool/issues/2881#issuecomment-1291878142 (with very many thanks to @remi-kazeroni for running the impossible to run ones!) and are as follows:
(*)
means not counting/counting the one that had a DiagnosticError but was fixed but not PR-edRunning the comparison
Login and access to the DKRZ esmvaltool VM
Results from recipe runs are stored on the VM; login with:
Get and install miniconda on VM
E.g.
scp Miniconda3-py39_4.12.0-Linux-x86_64.sh b382109@esmvaltool.dkrz.de:~
from a file already on Levante.Setting up the input files
If you wrote recipe runs output to Levante
/scratch
partition be aware that the data will be removed after two weeks, so you will have to move the output data to the/work
partition, via e.g. anohup
job:/work
is visible by the VM so you can run the compare tool straight on the VM.NOTE do not store final release results on the VM including
/preproc/
dirs, the total size for all the recipes output, including/preproc/
dirs is in the 4.5TB ballpark, much too high for the VM storage capacityRunning compare tool at VM
tool270Compare
release270stable
pip install imagehash
Input/output/run
/work/bd0854/b382109/v270
(containspreproc/
dirs too, 122 recipes)/mnt/esmvaltool_disk2/shared/esmvaltool/v2.6.0rc4
(does not containpreproc/
dirs)nohup python ESMValTool/esmvaltool/utils/testing/regression/compare.py /mnt/esmvaltool_disk2/shared/esmvaltool/v2.6.0rc4 /work/bd0854/b382109/v270 > compare270output.txt
Sanity check, as outputted by
compare.py
First pass result
Running the
compare.py
results in a few recipes not-OK (NOK) wrt plots differing from previous release v2.6.0, summary in https://github.com/ESMValGroup/ESMValTool/issues/2881#issuecomment-1294735465Detailed plots inspection
Plots that differ for the 34 recipes that have them different is happening in https://github.com/ESMValGroup/ESMValTool/issues/2881#issuecomment-1295001054