Generalize `recipe_validation*` recipes to allow for same model comparisons

ESMValGroup / ESMValTool

ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP

https://www.esmvaltool.org

Apache License 2.0

217 stars 127 forks source link

Generalize `recipe_validation*` recipes to allow for same model comparisons #2281

Closed valeriupredoi closed 2 years ago

valeriupredoi commented 3 years ago

suggested in https://github.com/ESMValGroup/ESMValTool/issues/2270 - get the validation recipes to allow for the same model name for both EXPERIMENT and CONTROL, but obviously with different data parameters (experiment, ensemble, etc)

valeriupredoi commented 3 years ago

@hahsan1 I have opened a pull request #2284 with the new feature included (allowing for using the validation diagnostic even when datasets have the same name ie different facets of the same model) - I would appreciate it if you had some time to test it and see if it solves your functional need and if you could also provide some feedback if there are any issues. Many thanks for working with us! :beer:

hahsan1 commented 3 years ago

Hi @valeriupredoi, I tried running the recipe again with the updated scripts from pull request #2284 but I get this error: UnboundLocalError: local variable 'experiment' referenced before assignment.

I have uploaded the recipe I used as well as the log files if that may be of help to diagnose the error.

I have one comment about the recipe file. I specified the same dataset name (i.e. GISS-nudge) for both the control and experiment models. I was wondering which one it will interpret to be the control and experiment?

recipe_validation_CMIP6.txt main_log.txt log.txt

valeriupredoi commented 3 years ago

hi @hahsan1 cheers for trying out the enhanced diagnostic! Note however that you are still using the old diagnostic and its components, you need to make sure that /people/ahsa361/.conda/envs/esmvaltool/lib/python3.8/site-packages/esmvaltool/diag_scripts/validation.py and /people/ahsa361/.conda/envs/esmvaltool/lib/python3.8/site-packages/esmvaltool/diag_scripts/shared/_validation.py are updated with the ones from the pull request, together with the config file as well; I would checkout that branch and do a pip install -e .[develop] in the recently checked out repo if I was you, so you install the changes locally and override the conda env installed version. Let me know if I can assist with installing just that particular branch, your recipe looks to be OK and that will work fine with the new changes :beer:

hahsan1 commented 3 years ago

I would like to report that it works! 🎉 As you pointed out, I needed to update the two scripts in the environment folder.

I have a minor suggestion. I was wondering if it would be a good idea to generalize the script so it could run more than one type of mip (e.g. Amon, AERmon) in the same recipe. As it stands, I believe we can only specify one type of mip per recipe.

valeriupredoi commented 3 years ago

I would like to report that it works! tada As you pointed out, I needed to update the two scripts in the environment folder.

Excellent news, cheers for updating and retesting :beer: - if you could approve the PR that would be fab :+1:

About mip's - you can specify however many/whatever mip you need to, it's mip per dataset, not mip per recipe, so in the dataset dictionary, just use the mip: key and request whichever mips you need for each dataset!

hahsan1 commented 3 years ago

Ah so would the syntax be something like this in the recipe:

datasets:
  - {dataset: GISS-nudge, project: CMIP6, mip: [Amon, AERmon], activity: AerChemMIP, institute: NASA-GISS, exp: reference, ensemble: r1i1p5f104, grid: gn, start_year: 2000, end_year: 2004}

Then under variables, I could specify both types, so for example, rlut and emiso2.

hahsan1 commented 3 years ago

Hmm, it doesn't seem to work the way I suggested. I get this error:

yamale.yamale_error.YamaleError: Error validating data '/pic/projects/GCAM/Emissions-MIP/ESMValTool/esmvaltool/recipes/emissions_mip/Phase1a/global/recipe_validation_CMIP6_2.yml' with schema '/people/ahsa361/.conda/envs/esmvaltool_test/lib/python3.8/site-packages/esmvalcore/recipe_schema.yml'
    datasets.0.mip: '['Amon', 'AERmon']' is not a str.
    datasets.1.mip: '['Amon', 'AERmon']' is not a str.

This is what the recipe looks like, as attached. recipe_validation_CMIP6_2.txt

valeriupredoi commented 3 years ago

no, mip: [Amon, AERmon] will not work - the only way that will work is when you use it to complement missing data from eg Amon with AERmon data, but in a side-by-side comparison of two identical models (or even different models) that will give you ambiguous at best results, if not wrong. Using one mip per dataset is the way with the validation recipe. But if you told me the used case for multiple mips you need running then I could tell you what to/how to do :beer:

hahsan1 commented 3 years ago

I'm interested in both the Amon and AERmon mips. Having the script accommodate both simultaneously would be great! But all in all, it wouldn't be a big deal if I had two separate recipes.

valeriupredoi commented 3 years ago

right! you want to run identical analyses on multiple mips in parallel - afraid that's not possible this way, you can, however, extend the recipe with more diagnostics that do exactly that - list all your datasets (with all the different mips, but one mip per dataset) in one datasets section, then list all the diagnostics you want to run, giving them relevant and different names, and including each of the control/experiment pair you want to compare in each of these diagnostics. Does that make sense?

hahsan1 commented 3 years ago

I got it to work another way! I omitted the mip keys in the dataset section and added it to each variable in the diagnostic section, like this:

    variables:
      rlut:
        preprocessor: pp_rad
        mip: Amon
      emiso2:
        preprocessor: pp_rad
        mip: AERmon

How do I approve the PR, do I simply leave a comment?

Thanks again for all the help!

valeriupredoi commented 3 years ago

oh yeah that works too, for different variables, well done :+1: Yes please, if you could just confirm there that it works for you, as comment on the PR that would be awesome, cheers :beer: