ESMValGroup / ESMValCore

ESMValCore: A community tool for pre-processing data from Earth system models in CMIP and running analysis scripts.
https://www.esmvaltool.org
Apache License 2.0
42 stars 38 forks source link

Allow for CMOR checks but DO NOT raise exceptions (at the user's risk!) #338

Closed valeriupredoi closed 4 years ago

valeriupredoi commented 5 years ago

Not good, guys, not good - analyses that need to go through past nit-picky CMOR exceptions still stumble at them and return exceptions with the tool stopping. I thought the whole point of cmor_strict: false was to bypass all these roadblocks :beer: Running an IPCC stuff for @LisaBock and am getting pestered by height2m not present. Who cares about bloody height2m? :angry:

mattiarighi commented 5 years ago

Which recipe are you trying? Is it due to the recent merge of #305?

valeriupredoi commented 5 years ago

recipe_ipccwg1ar6ch3_highres

mattiarighi commented 5 years ago

Can you try a few another ones to see whether it is a general issue or specific to that recipe?

schlunma commented 5 years ago

I think cmor_strict does not affect any CMOR check, it only affects variable reading, for example allowing the use of the custom tables (at first I also thought it would allow the skipping of some tests, but it doesn't).

valeriupredoi commented 5 years ago

check_metadata() has no provision of not raising CMORCheckError (fail_on_error set to False will eventually raise it at the end anyway), it's a fundamental problem with our CMOR issues handling not an issue resulting from a PR, man :beer:

valeriupredoi commented 5 years ago

yep, @schlunma is correct - I want the check to find the issues, fix as many as possible but don't die on me if it couldn't fix everything, just report them and march on. Something like the flag in #307 that if set, it should go on at the user's risk, but go on until it hits rock bottom (or not)

valeriupredoi commented 5 years ago

...which in my case, it went on and finished the bloody recipe no problemo, without any height2m, coz that's just like Brexit, nobody needs it but it's there for some reason :grin:

bouweandela commented 5 years ago

If you want no CMOR checks (see also #88), you can disable them in the recipe by making sure that every preprocessor contains:

fix_metadata: {cmor_table: null}
fix_data: {cmor_table: null}
cmor_check_metadata: false
cmor_check_data: false

However, if you're missing a 2m height coordinate for a dataset I would recommend implementing a fix for it, this shouldn't be much work.

Note however that many preprocessor functions and diagnostic scripts depend on the data being in the format specified by the CMOR standard, so disabling the checks is not recommended.

valeriupredoi commented 5 years ago

@bouweandela the issue I am raising is slightly more fundamental than simply adding a few lines in the recipe setting the checks to false and skipping those preprocessor steps: this is equivalent to a tank commander stripping the whole armor off the tank to allow for greater mobility. Instead I'd propose a mission-driven installation of armor:

This is something that I am sure @axel-lauer would agree with from the perspective of science friendliness :beer:

axel-lauer commented 5 years ago

The 3-level idea proposed by @valeriupredoi sounds absolutely fantastic! The basic checks could be the default setting for doing "exploratory science", full checks the recommended setting for publications and no checks a "last aid" setting for "I know what I do, just read that data set" cases. I also particularly like the idea of logging all errors put not throwing exceptions in the "basic checks and fixes" and the "no checks" cases.

veyring commented 5 years ago

There has been a lot of discussion regarding the strict CMOR checks that are currently implemented in ESMValTool v2. What I am hearing from many of you is that “It is incredibly frustrating that ESMValTool can't handle any minor deviation from CMOR data”. While from a “good practice” coding point of view the checks make a lot of sense, in the meantime this really turns out to be a huge obstacle in actually using the tool for what it is built for, namely facilitating routine and rapid evaluation of the CMIP ensemble.

I therefore strongly support the proposal from @valeriupredoi to implement three different levels of checks to be selected by the user. I would like to suggest to implement this immediately. If you (particularly @ESMValGroup/esmvaltool-coreteam) have further suggestion on this proposal, please ensure to comment here by the end of this week. I suggest that then early next week @valeriupredoi is implementing his proposal if there are no further strong convincing arguments against this.

valeriupredoi commented 5 years ago

I see :heart: 's all round, I should candidate for the next elections in the UK in December :rofl: Cheers @veyring for your note! I reckon it's best if @jvegasbsc and myself will be working on this, rather than just myself, since Javi is the authority when it comes to CMOR+ESMValTool. Javi will hopefully recover from his illness by next week (get better soon, bud!!) so we can start the work :beer:

ledm commented 5 years ago

“It is incredibly frustrating that ESMValTool can't handle any minor deviation from CMOR data”

I said this, but I would like to clarify what I mean. I'm not frustrated that we have strict CMOR compliance built in, but rather that there is so much variability in the standardisation of CMIP datasets. This variability combined with the strictness of ESMValTools CMOR checks means that almost no data can be read as is (in my experience). I've spent far more time working on fixes than on diagnostics or recipes.

I wish that checks had been made earlier in the process perhaps by the modelling groups or by WCRP before submitting or publishing non-standardised data. This is a model-intercomparison project (cMIP!), so it shouldn't be too much to ask to require the data to be inter-comparable.

valeriupredoi commented 5 years ago

@ledm asking scientists to use (and not just observe) a set of standards is like herding cats, man

ledm commented 5 years ago

Just to make sure it more widely known, this is from @veyring's email:

There is a formal way to catch and report errors provided by ES-Doc, see: https://errata.es-doc.org/static/index.html?project=CMIP6

I have a feeling that once the SOD is done, I'm going to need a few days to report all the issues with msftyz.

valeriupredoi commented 5 years ago

Poor SODs, they gonna hate us :grin:

Dr Valeriu Predoi. Computational scientist NCAS-CMS University of Reading Department of Meteorology Reading RG6 6BB United Kingdom

On Wed, 6 Nov 2019, 09:07 Lee de Mora, notifications@github.com wrote:

Just to make sure it more widely known, this is from @veyring https://github.com/veyring's email:

There is a formal way to catch and report errors provided by ES-Doc, see: https://errata.es-doc.org/static/index.html?project=CMIP6

I have a feeling that once the SOD is done, I'm going to need a few days to report all the issues with msftyz.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESMValGroup/ESMValCore/issues/338?email_source=notifications&email_token=AG5EFIZHFPMF2J6KXUQ4SPLQSKCOVA5CNFSM4JGIEFR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDFZ6ZA#issuecomment-550215524, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG5EFIYAEGYJV5JYTILIHMDQSKCOVANCNFSM4JGIEFRQ .

veyring commented 5 years ago

Information from Karl Taylor: "There is extensive QC in place to check the global attributes and some of the variable attributes, which are used extensively throughout the CMIP6 infrastructure, but the QC of the data itself is left to those contributing data. Those groups using CMOR to write their data have subjected it to additional QC checks not part of the ESGF basic checks, and I suspect that fewer groups may be using CMOR this round than used it in previous phases, so this might explain some of the problems. See https://goo.gl/NmuENr for documentation on what QC is done by PrePARE as part of the publication procedure on ESGF."

Please report issues to the responsible modeling groups as described at https://pcmdi.llnl.gov/CMIP6/Guide/dataUsers.html#6-reporting-suspected-errors. The modeling groups will verify the problems and then enter the information into the errata data base at https://errata.es-doc.org/static/index.html.

valeriupredoi commented 5 years ago

I am going to create a dedicated issue for reporting dataset issues and stop the conversation here since this issue deals with implementing the three-tiered checks and should not blow up on multiple fronts :beer:

bouweandela commented 5 years ago

Before starting to code the intermediate level of checking, it would be really useful to know what @valeriupredoi actually going to implement. That is, we need to have a list of what errors are serious enough to stop and/or what errors will only cause a warning. @ledm and @axel-lauer you seem to have an opinion on this, so maybe you could help out by starting to compile such a list?

It would also be good to know what to do with errors not in the list, i.e. should the default behaviour be to stop if an error is encountered, unless it's on the list of errors that are hopefully safe to ignore?

valeriupredoi commented 5 years ago

So this is best laid down if we actually list the types of metadata checks (note I don't propose tiering any of the data checks) performed by the tool. I propose naming Checks Levels: NONE, MEDIUM, STRICT and listing the checks I propose the following actions depending on check level. Note that for None we should log all issues but don't report any error (don't raise any exception):

CMORCheck.check_metadata: _check_var_metadata():

I invite people to chime in. Also @jvegasbsc or anyone else pls add any other check that I may have omitted :beer:

valeriupredoi commented 5 years ago

Basically I am trying to replicate iris's behavior towards a Regular Joe netCDF file for MEDIUM with the addition of extra security for units and monotonic coord values; STRICT is strict like hell :smiling_imp:

zklaus commented 5 years ago

Just a table version of @valeriupredoi list above; no other changes.

CMORCheck.check_metadata MEDIUM STRICT
standard_name not CMOR log error
long_name not CMOR log error
units not CMOR log error
attributes not present log error
rank not == CMOR rank log error
coordinates
standard_name not CMOR (1d, 2d [latitude, longitude] is treated as exceptional for CMIP6) log error
does not exist error error
units error error
monotonicity and direction error error
values (longitude, latitude: monotinic, CMOR range) error error
values (time: monotonic, reference time, frequency) error error
valeriupredoi commented 5 years ago

darn swanky table @zklaus cheers :beer:

valeriupredoi commented 5 years ago

the implementation should be very straightforward since report_error() in check.py for each of these functions should be replaced with a more flexible report() function that'll take both report_error() and report_warning() and use them according to case. Prob a piece of cake for @jvegasbsc or @sloosvel :beer:

sloosvel commented 5 years ago

Some of the checks on the table deal with the errors with automatic fixes. Would in this cases be necessary to add this intermediate level of reporting?

Prob a piece of cake for @jvegasbsc or @sloosvel

Well thanks for assuming that. But I currently have more than enough cake with the primavera metrics. So you are more than welcome to help out if you can / want to.

valeriupredoi commented 5 years ago

Sure thing, I will! In a work retreat til Wed but will start helping myself with cake on Wed :grin:

Dr Valeriu Predoi. Computational scientist NCAS-CMS University of Reading Department of Meteorology Reading RG6 6BB United Kingdom

On Mon, 11 Nov 2019, 12:22 sloosvel, notifications@github.com wrote:

Some of the checks on the table deal with the errors with automatic fixes. Would in this cases be necessary to add this intermediate level of reporting?

Prob a piece of cake for @jvegasbsc https://github.com/jvegasbsc or @sloosvel https://github.com/sloosvel

Well thanks for assuming that. But I currently have more than enough cake with the primavera metrics. So you are more than welcome to help out if you can / want to.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ESMValGroup/ESMValCore/issues/338?email_source=notifications&email_token=AG5EFI25XT7G7ZDEOM7DVU3QTFE6XA5CNFSM4JGIEFR2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDWVADI#issuecomment-552423437, or unsubscribe https://github.com/notifications/unsubscribe-auth/AG5EFI5XSCRT4C2LPOUIQOLQTFE6XANCNFSM4JGIEFRQ .

sloosvel commented 5 years ago

Enjoy the retreat! Wednesday is fine for me to start working on that.

axel-lauer commented 5 years ago

The error handling proposed by Klaus is great. 3 small issues with datasets have turned out to be very annoying in the past, so I would propose to write warnings to the log file instead of handling them as errors when cmor check is set to medium, i.e.:

CMORCheck.check_metadata MEDIUM STRICT
monotonicity of time coordinate log error
frequency of time coordinate log error
longitude = -180...180 instead of 0...360 automatic fix + log error
bouweandela commented 5 years ago

I would like to propose a different naming scheme for the different levels of checking, e.g. none, relaxed, default, as there is the possibility for confusion with the cmor_strict option in config-developer.yml and I think that the default behaviour should remain unchanged. Also, I think there is no need to write the levels in capital letters.

Regarding the technical implementation, I would propose to add the keyword argument relaxed=true to the function cmor_check_metadata which sets the level of checking to relaxed. This keyword could then be set by a command line option which modifies the preprocessors defined in the recipe, this option could e.g. be called --cmor-checks and takes as argument one of the 3 different levels none, relaxed or default.

valeriupredoi commented 5 years ago

@bouweandela I like the naming convention, especially relaxed :grin: Note that default may be slightly misleading ie it doesn't give the user info on the level of strictness, but if we explain it well in the documentation then it'll be a-ok.

Do you not think the cmor_checks could be an option in config-user.yml? Or that file is too crowded anyway?

bouweandela commented 5 years ago

Do you not think the cmor_checks could be an option in config-user.yml? Or that file is too crowded anyway?

I think that using --cmor-checks none or --cmor-checks relaxed fits in well with other command line options like e.g. --skip-nonexistent and `--max-years`` which also modify the recipe in some way for the convenience of the user who doesn't care too much about getting the exact same results and is prepared to accept the risk that the recipe crashes because they're trying to use it for something that it wasn't designed for.

valeriupredoi commented 5 years ago

sounds about right, man :beer: I also found this one picture guide to be quite useful to show what the relationship is between ESMValTool and CMIP6 data cmip6

sloosvel commented 5 years ago

I also prefer the command line option, I think it's safer. So in summary, the report_error functions should be adapted to general report functions that give either an error, a warning or nothing (as @valeriupredoi pointed out) depending on the level of checking given by --cmor-checks?

valeriupredoi commented 5 years ago

yup, started work here https://github.com/ESMValGroup/ESMValCore/pull/374

valeriupredoi commented 4 years ago

I am going to create a dedicated issue for reporting dataset issues and stop the conversation here since this issue deals with implementing the three-tiered checks and should not blow up on multiple fronts

the data issue template has its first use in #384