Create dedicated dataset issue logs in run/ with only the dataset warnings from CMOR checks

valeriupredoi commented 5 years ago

Is your feature request related to a problem? Please describe. This feature request is related to the new implementation of tiered cmor checks that we now have in #374 and directly related to the --dry-run optionality in #307 -- it would be very useful if the tool created a data_report.txt a la all the main_log.txt and main_log_debug.txt; this report should contain only the warnings and critical warnings resulted from the cmor checks with relevant information on the dataset that created those warnings eg:

2019-12-03 16:26:34,241 UTC [2177] WARNING There were warnings in variable tas:
longitude: does not exist
 lat: standard_name should be latitude, not grid_latitude
 tas: does not match coordinate rank

Dataset:
([<iris 'Cube' of convection_time_fraction / (K) (time: 1872; grid_latitude: 96; -- : 192)>], {'project': 'CMIP5', 'dataset': 'MPI-ESM-LR', 'short_name': 'tas', 'cmor_table': 'CMIP5', 'mip': 'Amon', 'frequency': 'mon', 'check_level': <CheckLevels.CRITICAL: 4>})

They are retrievable from the logs yes, but a user will be much happier to grab this info straight from a dedicated log file.

These type of logs can also be used to produce correct dataset issues using our new dataset issue template.

Would you be able to help out? Would you have the time and skills to implement the solution yourself? Yeps, just waiting on both #307 and #374 to be merged

bouweandela commented 4 years ago

Apparently there is a batch reporting tool for issues with CMIP6 data. It would be nice if esmvaltool could produce a report in a format that can be used with this tool. @ESMValGroup/esmvaltool-coreteam does anyone know if this batch reporting tool is available already?

zklaus commented 4 years ago

Are you referring to https://github.com/ES-DOC/esdoc-errata-client ? I am afraid it is not applicable for us: The modeling groups are officially filing the errata issues. All we can do is send them an email and hope that they follow up. For this, we have to take the contact attribute from the data file.

bouweandela commented 4 years ago

That looks like something we could use, there is a specification of how the issues/datasets should be named: https://es-doc.github.io/esdoc-errata-client/create_cli.html#edit-the-issue

We could at least format the names of the input datasets/files in the same way in our cmor checker output and provide content for the 'title', 'description' and 'project' fields. That way it might be easier for modelling groups to use our input.

zklaus commented 4 years ago

Formatting things so they are helpful is certainly a good idea. I only meant that we cannot submit the report in general due to authorization.

ledm commented 4 years ago

I like this idea of automatiically summarising all dataset issues in a separate file. The time spent dealing with non-compliant datasets is non-trivial, and there are still enough problems that it may prevent new users from using ESMValTool.

BenMGeo commented 4 years ago

I support this idea.

A good add would be to report the full absolute file name that is read in (or at least ESMValTool tries).

Sometimes I just realized I had a typo, but I was not aware of it. (I added a line in some local branch, but might be more useful to include with this separate log.)

ESMValGroup / ESMValCore

Create dedicated dataset issue logs in run/ with only the dataset warnings from CMOR checks #387