ESMValGroup / ESMValTool

ESMValTool: A community diagnostic and performance metrics tool for routine evaluation of Earth system models in CMIP
https://www.esmvaltool.org
Apache License 2.0
211 stars 124 forks source link

Fix provenance logging #240

Closed mattiarighi closed 5 years ago

mattiarighi commented 6 years ago

The provenance logging based on the NCL procedure inquire_and_save_fileinfo in interface_scripts/logging.ncl is currently broken. This procedure uses specific global attributes in the preprocessed NetCDF file (fixfile, version, infile_NNNN, tracking_id, and reference) to document the provenance of all input data and write it in the NCL log file.

The problem is that the new preprocessor does not write such attributes in the output file. This functionality should be reimplemented.

bouweandela commented 6 years ago

It would probably be good if we could get the main esmvaltool program to take care of writing provenance information as much as possible, so diagnostic script developers need to spend minimal effort on this. That would probably mean that the preprocessor should write it's own provenance information. Of course that information will then be made available to any diagnostic scripts subsequently using the preprocessed data.

There is also a standard available for provenance information, it would probably be good to start using that: https://www.w3.org/TR/prov-overview/

Maybe @nielsdrost also has good ideas on how to tackle this topic?

nielsdrost commented 6 years ago

Yes, writing prov-o from the main esmvaltool program is exactly the approach I envision. Will produce a better write-up for others to comment on.

bouweandela commented 6 years ago

Possibly related issues: #229 and #278

bouweandela commented 6 years ago

Link to provenance design document produced by @nielsdrost.

nielsdrost commented 6 years ago

Discussion (via document) seems to not agree on if (and how) provenance info should be embedded. I propose we write it as a separate prov xml file for now.

bouweandela commented 6 years ago

Main conclusion from telco on the interface between workflow manager (esmvaltool main program) and diagnostic scripts:

First step to start implementation:

axel-lauer commented 6 years ago

Here is a brief overview of how provenance is handled in version 1.1.0. Maybe we could use at least the kind of meta data created in v1.1.0 as a starting point. The actual meta data consists of so-called "tags" (e.g. R_atmos, P_crescendo, PT_geo). These tags can be translated to more human readable information using the lists defined in the file doc/MASTER_authors-refs-acknow.txt. The meta data is passed through from the backend/interface layer to the diagnostics. The diagnostics then add more meta data and finally call an interface function that collects everything and writes the information to the exif headers of the individual figures (.png) and/or to separate .xml file(s). In v1.1.0, the tags are defined in different parts of the ESMValTool:

Backend

Interface layer

Namelist

Diagnostic (per plot, each figure is written to a separate file)

bouweandela commented 6 years ago

We will probably make use of the prov library to write the provenance information to XML.

LisaBock commented 6 years ago

We made a first suggestion for the structure of the provenance with the prov library. The visualization could look like this: article-prov

nielsdrost commented 6 years ago

Awesome! Definitely looks like the way to go. This structure should fit all the information we need.

bouweandela commented 6 years ago

Work on this issue is done in the version2_provenance branch.