EC-Earth / cmor-fixer

On-site fixer script for cmorized data
Apache License 2.0
0 stars 1 forks source link

Add version tracking on data files to fixer #32

Closed zklaus closed 3 years ago

zklaus commented 3 years ago

This issue is the continuation of an email thread that I post here:

On 17/06/2020 10:44, Thomas Reerink wrote: On Tue, 16 Jun 2020 at 11:08, Zimmermann Klaus wrote:

One thing we might start considering is that the cmor fixer should add an attribute to files it has been treating, including it's own version. This way, we can take the guess work out of it on whether some file has already been corrected or not.

This is already the case, when cmor-fixer applies a change to the file, a line is added to its history including the version of the cmor-fixer. Related to that, after merging new features in the master (even before releasing) I highered the cmor-fixer version (this is a different policy then for ece2cmor3).

If fixes are applied only optionally, I would suggest to keep track of that as well in an attribute on the file.

The current strategy with cmor-fixer is that any error encountered will be always fixed (if the dry-run option is not active), I seems not really useful to keep a detected incorrectness. I think during the telecon I mentioned I planned to make options, but I changed plan on that. However, this means that each fix needs to be save, that means cmor-fixer needs to be capable to detect whether the error is already corrected so an operation is not applied more than once in case cmor-fixer is run once more for a new added fix.

zklaus commented 3 years ago

This issue is the continuation of an email thread that I post here:

On 17/06/2020 10:44, Thomas Reerink wrote: On Tue, 16 Jun 2020 at 11:08, Zimmermann Klaus wrote:

One thing we might start considering is that the cmor fixer should add an attribute to files it has been treating, including it's own version. This way, we can take the guess work out of it on whether some file has already been corrected or not.

This is already the case, when cmor-fixer applies a change to the file, a line is added to its history including the version of the cmor-fixer. Related to that, after merging new features in the master (even before releasing) I highered the cmor-fixer version (this is a different policy then for ece2cmor3).

It's good that we keep track of it. Imho, the history attribute is not the best place because it can be a bit volatile, meaning that some processing on data center sites such as jasmin are prone to altering it, possibly making it's parsing rather hard. I would suggest to add a custom attribute, eg

:ece2cmor_fixer_version = "0.3.0"

or similar. This way, the fixer has it super easy to know what is the starting point.

If fixes are applied only optionally, I would suggest to keep track of that as well in an attribute on the file.

The current strategy with cmor-fixer is that any error encountered will be always fixed (if the dry-run option is not active), I seems not really useful to keep a detected incorrectness. I think during the telecon I mentioned I planned to make options, but I changed plan on that. However, this means that each fix needs to be save, that means cmor-fixer needs to be capable to detect whether the error is already corrected so an operation is not applied more than once in case cmor-fixer is run once more for a new added fix.

I think it's very reasonable to apply all known fixes. Whether a fix has already been applied can then be determined simply by knowing which version of the fixer was run on a given file. This information seems to be available from the history attribute already now, but to simplify parsing I recommend to go with the custom attribute.

zklaus commented 3 years ago

Ping @treerink, @plesager, @goord, @uwefladrich.

treerink commented 3 years ago

@zklaus Will non-cmor added attributes always survive the publication process on the various ESGF nodes and also be visible in all metadata view platforms (such as the ESGF website)? We were just not absolutely sure, and therefore added this info in the history. But you might know this better?

zklaus commented 3 years ago

I am very optimistic about that. There is mention of user defined attributes in the CMIP6 output metadata requirements and other models, such as the IPSL ones make significant use of that.

treerink commented 3 years ago

In v2.4 the latest_applied_cmor_fixer_version attribute has been added.