EC-Earth / cmor-metadata-fixer

CMOR metadata fixer for cmorised output of any model
Apache License 2.0
0 stars 1 forks source link

Feature request: can the cmorMDfixer change more than just metadata? #6

Open klauswyser opened 1 month ago

klauswyser commented 1 month ago

Fixing metadata is important, but can the tool do more than that? Possible use cases are

Can the cmorMDfixer combined with other tools to do this, e.g. rename or mv for renaming files/directories directly from the shell script.

klauswyser commented 1 month ago

To illustrate the challenge here is a wiki with differences between CMIP6 and CMIP6plus

Variable names change, table names change, ... Can we address all this? Or will it be easier to simply rerun ece2cmor3?

(another issue is if ece2cmor3 can cope with new table and new variable names.)

treerink commented 1 month ago

Another added question:

Ah I see now, second part of your first point.

treerink commented 1 month ago

The added small example metadata file with modifications (see cdd4110) can be used for instance like this:

./cmorMDfixer-safe-mode-wrapper.sh 1 metadata-correction-cases/optimesm-convert-to-cmip-plus-example.json cmorMDfixer-test-data/test-set-01/CMIP6

shows that the cmorMDfixer can modify several of the requested data.

Let's extend this example with better real example data.

klauswyser commented 1 month ago

Yes, the DRS may need updatig, depending on which metadata are changed.

klauswyser commented 1 month ago

I was thinking a little bit more about this issue, maybe it's not easy to change variable names (and variable attributes) with the fixer that works mainly on the global attributes. Maybe it's easier to do that in a bash script with ncrename, at the same time this could fix the DRS. The drawback is that e had two tools, one for the metadata, one for variables/DRS.

What do you think?

PS: efficiency is no issue, ncrename can be easily used with GNU parallel.

treerink commented 1 month ago

Based on your suggestion I am thinking of the following sketch:

Because cmorMDfixer-safe-mode-wrapper.sh is anyway a bash script we can use this script or we use another wrapper script around to do this all.

The global attribute metadata modifications, cmorMFfixer part, can be all specified from the metadata.json file. The DRS and variable name changes concerns a list of changes which should be specified separately. I witness in this Mapping between variables in CMIP6 and CMIP6Plus currently 18 variable name changes: Is this the list we have to cope with? Or should the tool be a bit more general such that other variable name changes can be added. I think the easiest is to make a bash for-loop including that 18 variables (and apply their renaming) which can be easily extended in the script, but this is of course more hard coded than providing a script with arguments in which the renaming can be organised. I think in the end I have a slight preference for the first option: the for-loop, a little bit more hard coded but easy extendable with less arguments to pass on multi levels.

treerink commented 1 month ago

The 18 variables with changed names:

CMIP6 table CMIP6 variable CMIP6Plus Table CMIP6Plus variable Notes
Amon hus APmon hus19 Note Variable name change
Amon ua APmon ua19 Note Variable name change
Amon va APmon va19 Note Variable name change
Eday hus APday hus19 Note Variable name change
Eday ta APday ta19 Note Variable name change
Eday ua APday ua19 Note Variable name change
Eday va APday va19 Note Variable name change
Eday wap APday wap19 Note Variable name change
Eday zg APday zg19 Note Variable name change
Emon hus APmon hus7h Note Variable name change
Emon ua APmon ua7h Note Variable name change
Emon va APmon va7h Note Variable name change
day hus APday hus8 Note Variable name change
day ta APday ta8 Note Variable name change
day ua APday ua8 Note Variable name change
day va APday va8 Note Variable name change
day wap APday wap8 Note Variable name change
day zg APday zg8 Note Variable name change

But besides this I now notice many Table names have changed in CMIP6plus, which complicates the case. (I also notice the criticism on that many CMIP6plus table name changes by the CMIP7 data request community.)

treerink commented 4 days ago

In 8663ef6 adding a little part of the puzzle, two small scripts to look up the cmip6plus table & var when the cmip6 one is given and viceversa (for all table - variable combinations).

treerink commented 2 days ago

In 4c19264 a bash script is added which is able to rename the DRS and variable name and to adjust the table_id (global attribute). Also some CMIP6 labels are adjusted to CMIP6plus, though some less relevant (in (broken) urls are left out yet). For some of these global_attributes it could potentially also be done directly with the cmorMDfixer itself, it is a choice, although adjusting the table_id is from iteration perspective easiest from this script as it is implemented now. It is also a bit of taste which part of the work is done by which script. I tend to say that changes which are very much related to each other to bundle them in the same script. What is not in the new script is an addition to the history (with timestamp).

treerink commented 2 days ago

An overarching script which calls both the cmorMDfixer and the convert-cmor-table-var-in-drs-and-metadata.sh script is not in place yet.

Maybe first some further extended testing and discussion?