ESMValGroup / ESMValTool

ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP
https://www.esmvaltool.org
Apache License 2.0
216 stars 126 forks source link

Merging IPCC analyses into a public branch #2068

Open ledm opened 3 years ago

ledm commented 3 years ago

This is a continuation of the discussion that we started at yesterday's monthly developer meeting.

We're now approaching the "final" version of IPCC analyses, and these works need to be made available to the wider public. I suspect that this is going to be a painful process.

I see three main issues:

  1. Record keeping,
  2. Updating,
  3. Maintenance.

Record keeping

Before trying to update or merge any code, we need to record:

Updating

This has been a complex process and my final code is fragile, inefficient, hacky and does not adhere to any kind of code standards. Unfortunately, it also uses outdated core and iris methods. In order to be merged and made accessible:

Personally, one of my recipes takes several weeks to run, so there's no chance I'm going to go back and test the full version. It may be possible to create a "minimal" functioning version for testing though.

Another problem here is that all these updates could mean that the public facing code does not produce the same figure or results as those included in the IPCC report.

Maintenance

Once all this has been updated, reviewed and merged, who will be responsible for maintenance? This will be a non-trivial task, and the original authors may not have the time in the next few years to keep up to date with the twists and turns of ESMValTool development until AR7 rolls around.

Perhaps instead of merging the recipe into the full ESMValTool master branch, we could create a public fork with these recipes and diagnostics? If not, the master could become very bloated! On the other hand, I've spent more than a year producing this work, it would be a waste for it to disappear into the either and have to start again from scratch for AR7!

Finally, are there any ways to automate some of this and make it easy for everyone? If any of the record keeping, the updating or the maintenance could be updated, that would be great.

Timing wise, the final IPCC AR6 report should be out this Autumn. Ideally, we want everything available by then. Approx 6 months from now - start the clock! :clock1:

valeriupredoi commented 3 years ago

cheers for this issue @ledm - I think the key here is modularization - not in the sense of Python modules (even if some of the bits from the whole IPCC codebase can go into the Core and we can even create an ar6 esmvalcore module) but in terms of the diagnostics - breaking down the codebase in easy-to-digest and maintain bits is the way forward IMHO. This is just waffling on my part so far, but shall we do it this way, in line with a timeline:

We must be very careful how we integrate the IPCC diagnostics with the aim of reusing them next time - AR7 - and minimal re-coding/re-factoring would be needed, ie the use of dictionaries would be highly encouraged :grin: How's this sound for a plan?

jvegreg commented 3 years ago

I think we can create an esmvalcore.ipcc or esmvalcore.ar6 dedicated module for those preprocessing functions that won't see much use apart from the IPCC analyses -> @bouweandela @jvegasbsc @schlunma what you guys stink?

I do not like to have the preprocessor tied to modules. If we end with too many preprocessor functions or too big files (I am looking at you, preprocessor._time.py), we should refine our theme organization to something like preprocessor.time.extraction, preprocessor.time.statistics. We already have some unintuitive names in the tool, let's try the core free of them

ledm commented 3 years ago

I've been rethinking my position on this issue. To be completely honest, it's been a hell of a long road, and I do not want update or maintain my ESMValTool-AR6 code anymore.

The pragmatic solution would be to preserve an original version of the code as it is. Ie: this is the code that we used, warts and all, this is the version of esmvalcore that I used, these are my recipes. Any additional changes to the code from that baseline would not represent what the figures in the Assessment Report. This is important for reproducibility and all that.

On the other hand, I can understand that someone (else - not me) would want to re-use this code (in 5 years for AR7 perhaps?). But I am not convinced that it would be worth the effort to update this code to iris 3, esmvaltool 2.2, let alone run it again it test it against each new version of each package that changes. Remember that it takes 4-6 weeks non-stop to run this analysis on jasmin, More likely that not, no one is ever going to run it ever again.

I don't think that it would be a quick and easy effort to prepare all IPCC AR6 recipes for merging into the ESMValTool master. If we must, that would be a wasteful top-down decision. Preserving the snapshot of where it is when the plots were when they were finalise seems like a much better and lazier option.

bettina-gier commented 3 years ago

Should we do it on a case-by-case basis then? I know some code for chapters 3 (and 5) figures would not need much adaptation to include in the current version (I think mine could work out of the box - not counting codacy issues) and don't run very long. There's also cases where existing diagnostics from the e.g. flato13ipcc figure were reused and had some improvements - Lisa added some stuff for the perfmetrics to support multiple projects with spaces between them etc. At least these types of improvements would be nice to include in the public branch.

I'd suggest to archive the code used by ipcc for all figures as lee suggests with "I used this version to run this code" and cherry-pick stuff like improvements or "easier" plots to include in the public esmvaltool.

malininae commented 3 years ago

@ledm and @bettina-gier Thanks for raising this topic. IPCC requirements include public, reproducible code, so we'd have to save the code in the form we created the figures and give it them within the next month or so (Guess some clean up is possible, but it's impossible to redo the code in non-hacky fashion, or improve it drastically). They'll store it, as I understand it correctly, on their host, or in their git directory, so no worries, the code will be available for AR7 :-) I agree, that ESMValTool community, the recipes/diagnostics should come in a somewhat different form and should be updated/improved. Personally, one of my extreme diagnostics for the final set of data takes about 10 days to run o_O , but there's not much one can do about it. However, the final basins figure (@ledm and @valeriupredoi know what I mean, and yes, my eye is also twitching), might be a nice addition to the tool and should improve with iris 3. So, I'm on board with @bettina-gier , do it on a case by case basis.

nielsdrost commented 3 years ago

Hi all,

Great that we are having this discussion!

At the very least I think we should know what everyone was missing from ESMValTool to create the diagnostics so that in AR7 there is less need for custom ESMValCore things.

As a first step, would it be possible to start creating PRs for all needed additions to the ESMValCore and ESMValTool outside of the diagnostics? So this would be preprocessors, CMORizers, fixes, etc.

The recipe we use to test could be something like the check obs recipe , but with all needed AR6 datasets.

This we should be able to do in the public repo, right?

remi-kazeroni commented 3 years ago

As a first step, would it be possible to start creating PRs for all needed additions to the ESMValCore and ESMValTool outside of the diagnostics? So this would be preprocessors, CMORizers, fixes, etc.

Yes that is definitely possible for the ESMValCore and should also be fine for ESMValTool, expect for the diagnostics. @LisaBock will contact IPCC developers to ask tehm to push their developments into the public repositories.

The figure code (diagnostic scripts) can not be made public before the report is published (August/September). There is still an uncertainty about what is meant by publishing the code. @LisaBock will enquire about and after we'll have an idea what this means for us in terms of ESMValTool releases.

Regarding the diagnostics, would it make sense to open PR in the AR6 and start the merging process in that repo before the publication of the IPCC report? How tricky would it be to move the code from the AR6 master branch to the public master branch?

nielsdrost commented 3 years ago

Regarding the diagnostics, would it make sense to open PR in the AR6 and start the merging process in that repo before the publication of the IPCC report? How tricky would it be to move the code from the AR6 master branch to the public master branch?

The problem is that all the code would become one large PR that would be impossible to review, let alone merge. So I would strongly recommend not merging anything to the AR6 master. But perhaps having a PR in the AR6 repo (but never merging it!) would already help getting things organized?

bouweandela commented 3 years ago

On a side note: the code quality services like CircleCI and Codacy that we rely on for maintaining code quality are only free for public repositories.