Open klauswyser opened 1 month ago
To illustrate the challenge here is a wiki with differences between CMIP6 and CMIP6plus
Variable names change, table names change, ... Can we address all this? Or will it be easier to simply rerun ece2cmor3?
(another issue is if ece2cmor3 can cope with new table and new variable names.)
Another added question:
Ah I see now, second part of your first point.
The added small example metadata file with modifications (see cdd4110) can be used for instance like this:
./cmorMDfixer-safe-mode-wrapper.sh 1 metadata-correction-cases/optimesm-convert-to-cmip-plus-example.json cmorMDfixer-test-data/test-set-01/CMIP6
shows that the cmorMDfixer can modify several of the requested data.
Let's extend this example with better real example data.
Yes, the DRS may need updatig, depending on which metadata are changed.
I was thinking a little bit more about this issue, maybe it's not easy to change variable names (and variable attributes) with the fixer that works mainly on the global attributes. Maybe it's easier to do that in a bash script with ncrename, at the same time this could fix the DRS. The drawback is that e had two tools, one for the metadata, one for variables/DRS.
What do you think?
PS: efficiency is no issue, ncrename can be easily used with GNU parallel.
Based on your suggestion I am thinking of the following sketch:
gnu-parallel
rsync -a
copy is made in which the DRS is set to the new correct naming (question: or are these datasets to large to have a copy? Or: having this as an option?)gnu-parallel
ncrename
Because cmorMDfixer-safe-mode-wrapper.sh
is anyway a bash script we can use this script or we use another wrapper script around to do this all.
The global attribute metadata modifications, cmorMFfixer
part, can be all specified from the metadata.json
file. The DRS and variable name changes concerns a list of changes which should be specified separately. I witness in this Mapping between variables in CMIP6 and CMIP6Plus currently 18 variable name changes: Is this the list we have to cope with? Or should the tool be a bit more general such that other variable name changes can be added. I think the easiest is to make a bash for-loop including that 18 variables (and apply their renaming) which can be easily extended in the script, but this is of course more hard coded than providing a script with arguments in which the renaming can be organised. I think in the end I have a slight preference for the first option: the for-loop, a little bit more hard coded but easy extendable with less arguments to pass on multi levels.
The 18 variables with changed names:
CMIP6 table | CMIP6 variable | CMIP6Plus Table | CMIP6Plus variable | Notes |
---|---|---|---|---|
Amon | hus | APmon | hus19 | Note Variable name change |
Amon | ua | APmon | ua19 | Note Variable name change |
Amon | va | APmon | va19 | Note Variable name change |
Eday | hus | APday | hus19 | Note Variable name change |
Eday | ta | APday | ta19 | Note Variable name change |
Eday | ua | APday | ua19 | Note Variable name change |
Eday | va | APday | va19 | Note Variable name change |
Eday | wap | APday | wap19 | Note Variable name change |
Eday | zg | APday | zg19 | Note Variable name change |
Emon | hus | APmon | hus7h | Note Variable name change |
Emon | ua | APmon | ua7h | Note Variable name change |
Emon | va | APmon | va7h | Note Variable name change |
day | hus | APday | hus8 | Note Variable name change |
day | ta | APday | ta8 | Note Variable name change |
day | ua | APday | ua8 | Note Variable name change |
day | va | APday | va8 | Note Variable name change |
day | wap | APday | wap8 | Note Variable name change |
day | zg | APday | zg8 | Note Variable name change |
But besides this I now notice many Table names have changed in CMIP6plus, which complicates the case. (I also notice the criticism on that many CMIP6plus table name changes by the CMIP7 data request community.)
In 8663ef6 adding a little part of the puzzle, two small scripts to look up the cmip6plus table & var when the cmip6 one is given and viceversa (for all table - variable combinations).
In 4c19264 a bash script is added which is able to rename the DRS and variable name and to adjust the table_id (global attribute). Also some CMIP6 labels are adjusted to CMIP6plus, though some less relevant (in (broken) urls are left out yet). For some of these global_attributes
it could potentially also be done directly with the cmorMDfixer itself, it is a choice, although adjusting the table_id is from iteration perspective easiest from this script as it is implemented now. It is also a bit of taste which part of the work is done by which script. I tend to say that changes which are very much related to each other to bundle them in the same script. What is not in the new script is an addition to the history (with timestamp).
An overarching script which calls both the cmorMDfixer
and the convert-cmor-table-var-in-drs-and-metadata.sh
script is not in place yet.
Maybe first some further extended testing and discussion?
Fixing metadata is important, but can the tool do more than that? Possible use cases are
Can the cmorMDfixer combined with other tools to do this, e.g. rename or mv for renaming files/directories directly from the shell script.