Closed mattiarighi closed 5 years ago
It would probably be good if we could get the main esmvaltool
program to take care of writing provenance information as much as possible, so diagnostic script developers need to spend minimal effort on this. That would probably mean that the preprocessor should write it's own provenance information. Of course that information will then be made available to any diagnostic scripts subsequently using the preprocessed data.
There is also a standard available for provenance information, it would probably be good to start using that: https://www.w3.org/TR/prov-overview/
Maybe @nielsdrost also has good ideas on how to tackle this topic?
Yes, writing prov-o from the main esmvaltool program is exactly the approach I envision. Will produce a better write-up for others to comment on.
Possibly related issues: #229 and #278
Link to provenance design document produced by @nielsdrost.
Discussion (via document) seems to not agree on if (and how) provenance info should be embedded. I propose we write it as a separate prov xml file for now.
Main conclusion from telco on the interface between workflow manager (esmvaltool main program) and diagnostic scripts:
First step to start implementation:
Here is a brief overview of how provenance is handled in version 1.1.0. Maybe we could use at least the kind of meta data created in v1.1.0 as a starting point. The actual meta data consists of so-called "tags" (e.g. R_atmos, P_crescendo, PT_geo). These tags can be translated to more human readable information using the lists defined in the file doc/MASTER_authors-refs-acknow.txt. The meta data is passed through from the backend/interface layer to the diagnostics. The diagnostics then add more meta data and finally call an interface function that collects everything and writes the information to the exif headers of the individual figures (.png) and/or to separate .xml file(s). In v1.1.0, the tags are defined in different parts of the ESMValTool:
Backend
Interface layer
Namelist
Diagnostic (per plot, each figure is written to a separate file)
We will probably make use of the prov
library to write the provenance information to XML.
We made a first suggestion for the structure of the provenance with the prov library. The visualization could look like this:
Awesome! Definitely looks like the way to go. This structure should fit all the information we need.
Work on this issue is done in the version2_provenance
branch.
The provenance logging based on the NCL procedure
inquire_and_save_fileinfo
ininterface_scripts/logging.ncl
is currently broken. This procedure uses specific global attributes in the preprocessed NetCDF file (fixfile
,version
,infile_NNNN
,tracking_id
, andreference
) to document the provenance of all input data and write it in the NCL log file.The problem is that the new preprocessor does not write such attributes in the output file. This functionality should be reimplemented.