Open larsbuntemeyer opened 1 year ago
@larsbuntemeyer Could you provide some CORDEX dataset files that I could try to debug PrePARE with? Thank you.
CMOR doesn't seem to use the values in the DRS
section in CMIP6_CV.json. I'm not sure if it should. @durack1 @taylor13 Any thoughts on this?
Regarding the parts where cmor_CV_checkParentExpID
and cmor_CV_checkSubExpID
are causing PrePARE to crash, I think this is due to the attributes parent_experiment_id
and sub_experiment_id
not being found for experiment entries in CORDEX_CV.json.
I think CMOR/PrePARE should be checking required_global_attributes
of the CV to see if those attributes need to be present rather than assuming they are required by the MIP. However, I don't see the parent_experiment_id
in CMIP6's required global attributes so maybe checking inside the experiment entry for the parent and sub would work better.
@durack1 obs4MIPS and input4MIPs don't have parent and sub experiments. How do they use CMOR/PrePARE?
I agree with using command line arguments with PrePARE to get the right files for the CV.
Hi @mauzey1 ! Thanks for picking this up again, sorry, for being unresponsive! I'll provide some CORDEX dataset soon!
CMOR doesn't seem to use the values in the DRS section in CMIP6_CV.json
Yes, that's confusing. It uses file and path templates from the dataset input table, not from the CV. On the other hand, PrePARE
has no option to override CMOR_DEFAULT_FILE_TEMPLATE
and FILE_NAME_TEMPLATE
...
I think CMOR/PrePARE should be checking required_global_attributes of the CV to see if those attributes need to be present rather than assuming they are required by the MIP
I think that is achieved by the _cmip6_option
, see also my confusion in this comment: https://github.com/PCMDI/cmor/discussions/679#discussion-4461588. Again, Prepare
does not allow to mirror this behaviour during the checks...
I am little bit tempted to implement some backend CV check module purely in python that could be utilized by PrePARE. Json handling and NetCDF attribute handling has really become easy and flexible within python.
I am little bit tempted to implement some backend CV check module purely in python that could be utilized by PrePARE. Json handling and NetCDF attribute handling has really become easy and flexible within python.
@larsbuntemeyer That is actually an idea I have been contemplating. I was thinking of making one based on the xarray. We could discuss that more on the discussion page.
As for the current issue with PrePARE, I wonder how we could tell if we are using a CV other than CMIP6 CV. I guess we could grab the prefix of the CV file assuming that the name of the CV file will always have the format <MIP name>_CF.json
. Knowing if the CV is not for CMIP6, we could skip the parent and sub experiment checks. However, I wonder if other MIPs also use parent and sub experiments.
@durack1 @matthew-mizielinski Do you know of any MIPs that use parent and sub experiments other than CMIP6?
You folks are starting to get aligned with what we've been discussing. We are thinking of following a structure similar to - note this has evolved a little, but is fine for discussion.
If we follow this structure, then the <MIPName>_CV.json
could be assumed, therefore allowing an update to PrePARE and CMOR to work across multiple projects without having to implement per-project hacks.
Note, we are trying to glean as much information about CMOR usage in the draft CMOR Users Survey, so if we have more specific info we want to collect from the community as part of planning, then feel free to suggest augmentations/edits to the questions we have drafted
@larsbuntemeyer Could you provide some CORDEX dataset files that I could try to debug PrePARE with? Thank you.
@mauzey1 You can now download current CORDEX example datasets from artifacts, e.g., here: https://github.com/WCRP-CORDEX/cordex-cmip6-cmor-tables/actions/runs/4043896243
@durack1 @matthew-mizielinski Do you know of any MIPs that use parent and sub experiments other than CMIP6?
Parent information: yes, particularly where they are extending CMIP6, e.g. run ssp245 with sulphate injections to keep global average temperature below a target.
sub-experiment id: yes, where there are initialised experiments such as those for the decadal/seasonal forecasting area.
@matthew-mizielinski : Aren't those considered CMIP6 runs, even if they are not the DECK/historical runs?
@matthew-mizielinski : Aren't those considered CMIP6 runs, even if they are not the DECK/historical runs?
Sorry, I should have put examples. Both of the following are outside of CMIP6;
I'm currently making changes to PrePARE to make its checks more like those done when cmorizing datasets. Below is the relevant section of cmor.c that does those checks.
https://github.com/PCMDI/cmor/blob/304db223f77b32df94d2337eca557da0c3d2c7cc/Src/cmor.c#L3038-L3062
The check for source ID is done if the GLOBAL_IS_CMIP6
variable is true, or if current dataset's further info URL is not an empty string. Should the source ID check be done by default? Isn't that attribute essential for all MIPs?
The check for source ID is done if the
GLOBAL_IS_CMIP6
variable is true, or if current dataset's further info URL is not an empty string. Should the source ID check be done by default? Isn't that attribute essential for all MIPs?
That would also impact the cmorization itself, not only PrePARE, right? That would work for CORDEX with CMIP6. But i know that people are also using the old CVs with cmor3, in those, there is no source_id
but it uses the old model_id
vocabulary, so that might be a problem @sol1105 ?
@durack1 Why is CMOR checking if the further_info_url
is not an empty string before checking the source ID? The comment above this section of code suggests that it is related to obs4MIPs even though obs4MIPs_CV.json doesn't contain anything for further_info_url
. Should checking further_info_url
be a CMIP6-only check?
https://github.com/PCMDI/cmor/blob/304db223f77b32df94d2337eca557da0c3d2c7cc/Src/cmor.c#L3055-L3062
@mauzey1 good question, the further_info_url
(supported by ES-Docs, but I am unable to find a working example at this moment) is indeed supported for obs4MIPs (see https://github.com/PCMDI/obs4MIPs-cmor-tables/blob/master/inputs/pcmdi/RSS/RSS_prw_v07r01.json#L6), but all this hard coding is not ideal. We don't have further_info_url
support for input4MIPs, so what would be the case for that project?
Getting back into working on this issue, I've dug deeper into how the further_info_url
attribute is used by PrePARE. When PrePARE calls the function check_furtherinfourl
, it will retrieve the further info URL template from the variable cmor_current_dataset.furtherinfourl
.
https://github.com/PCMDI/cmor/blob/8d4533935c354bdcd7d7162df66a8f6663410111/Src/cmor_CV.c#L373-L377
cmor_current_dataset.furtherinfourl
was initialized with the default value for the URL template.
https://github.com/PCMDI/cmor/blob/8d4533935c354bdcd7d7162df66a8f6663410111/include/cmor.h#L266
cmor_current_dataset.furtherinfourl
would either use the value of further_info_url
stored in the CV file or in the user input if CMOR was being used to write data. However, PrePARE doesn't do this so it will just use the default template.
Another issue that I have noticed is the difference between the default value of the further info URL's template, and the value stored in the further_info_url
attribute of CV files. CMIP6_CV.json has the following for this attribute.
"further_info_url":[
"https://furtherinfo.es-doc.org/.*"
]
Although this appears to be a regex for matching any URL that begins with https://furtherinfo.es-doc.org/
, the following lines from cmor_CV_checkFurtherInfoURL
will make the function just ignore checking the URL due to the lack of "<>" tokens.
https://github.com/PCMDI/cmor/blob/8d4533935c354bdcd7d7162df66a8f6663410111/Src/cmor_CV.c#L379-L385
So even if PrePARE got the value for further_info_url
from the CV, it would just be ignoring the URL if the value was not a template that had "<>" tokens like the default in cmor.h.
The check for the further info URL is pretty much hardwired for CMIP6, or at least any project that used the same template for further info URLs.
Should we place the further info URL check within the group of CMIP6 checks, which can be disabled by the PrePARE user?
In CMOR, the check_sourceID
fucntion would be called if cmor_current_dataset.furtherinfourl
is not an empty string.
https://github.com/PCMDI/cmor/blob/8d4533935c354bdcd7d7162df66a8f6663410111/Src/cmor.c#L3055-L3062
This would mean that the checks for further info URLs and source_ids would be performed unless the user set further_info_url
to be an emtpy string. I can understand doing this for the URL but not the source_id.
Shouldn't check_sourceID
be done regardless of the further info URL, or whether or not the project is CMIP6?
@mauzey1 thanks for continuing to chip away at these issues. further_info_url
, which uses the es-doc URL, is only applicable to the CMIP6 and obs4MIPs project, within input4MIPs we define this as a URL pointing elsewhere, and for some of the earlier generated source_id
entries these don't exist. I also believe that for the latest datasets, even obs4MIPs has deprecated the ES-Doc usage (@gleckler1 can confirm)
Shouldn't
check_sourceID
be done regardless of the further info URL, or whether or not the project is CMIP6?
And this is exactly correct. The way that we have things set up, if we don't have a source_id defined (in CVs) that matches the dataset being written, then the program should throw a warning at minimum.
@taylor13 @matthew-mizielinski can confirm their agreement
Here are some offline comments, I'm including in this thread. Hope it's the right thread.
The global attributes specifications for CMIP6 included:
“ further_info_url has the form https://furtherinfo.es-doc.org/
The PrePARE default value is consistent with this BUT DOESN’T LOOK LIKE IT INCLUDES THE DOT (PERIOD) SEPARATORS:
I agree with Paul that we want to check source_id independent of further_info_url.
Reviewing some of the coding, it seems to me it is going to difficult to generalize it cleanly to handle other projects (obs4MIPs, input4MIPs, etc.). I have spent some time trying to write down what we would like to check with PrePARE and how we might code this in a general way to ingest simply formatted guidance from an input “configuration” file which would specify which attributes to check and how to check them. My ideas are contained in this document: https://docs.google.com/document/d/1JZcVRo2GucGUoZPeUNrmHmtB8LAs0QBAWiDuf0n3lSM/edit. You might also be interested in an earlier document I put together when PrePARE was originally being developed: https://docs.google.com/document/d/1d4_wdaY52xhLZBTeiTu2ZlpmpTO2Luqe47m2IX1WkWg .
I think it is worth considering how difficult it would be to implement the above from "scratch" (as opposed to modifying the current PrePARE code). I think it might be easier to start over and might take less time than trying to clean up and generalize the existing coding.
Perhaps we can all get together and you can propose what you think would be the best way to proceed.
@mauzey1 it would be useful to update PrePARE alongside the last planned release of CMOR3 to enable consistency checking as much as possible - let's revisit this issue as the final changes for 3.9.0 are identified to see what can be achieved within available time/resources
There is a lot of stuff considered above. Not sure we should spend resources generalizing PrePARE to handle the generality of the 3.9.0 release. Might want to wait on this for integration into the CMOR 4.0 development. @mauzey1, you might be in the best position to estimate how big the job. What's your advice on this?
@taylor13 I agree with this, if we have a "final" CV template that we can build PrePARE4 (along CMOR4) to work with, then I suggest that's the best path forward, rather than attempting to wrangle two codebases (C - CMOR, and Python - PrePARE) together.
@mauzey1 if you agree with this, we can tag this against the 4.0 milestone
@durack1 Yes, I agree with your suggestion of making this a CMOR4 milestone.
I would like to open this issue to collect some insight i drew from playing around with
PrePARE
using our preliminary CORDEX cmor tables. Although i can use cmor to rewrite data for projects without_cmip6_option="CMIP6"
, i can not usePrePARE
to evaluate files due to the following:PrePARE
always usesCMOR_DEFAULT_FILE_TEMPLATE
to evaluate the file name: https://github.com/PCMDI/cmor/blob/5f8759ea2f8cff4fa05657c10be62755b4c75611/LibCV/PrePARE/PrePARE.py#L262-L264 Digging through the C code, i guess there is no way yet to makecmor_CV_checkFilename
work with a DRS from the CV? Originally thefile_name_template
was defined in the input dataset attributes table but can not be recovered from the files global attributes. So in general, i guess it would be nice to have the DRS available from the CV table instead of the input file (also for the cmorization itself probably?). I am not sure what the purpose of the DRS entry in the CV file is after all...PrePARE
does and the checks during cmorization are not totally consistent if files are cmorized without the_cmip6_option
. For example, those checks in https://github.com/PCMDI/cmor/blob/5f8759ea2f8cff4fa05657c10be62755b4c75611/Src/cmor.c#L3043-L3050 are deactivated without the the_cmip6_option
which makes my CORDEX tables work with cmor. However,PrePARE
does not know about this and especially those checks in https://github.com/PCMDI/cmor/blob/5f8759ea2f8cff4fa05657c10be62755b4c75611/LibCV/PrePARE/PrePARE.py#L491-L492 and https://github.com/PCMDI/cmor/blob/5f8759ea2f8cff4fa05657c10be62755b4c75611/LibCV/PrePARE/PrePARE.py#L510-L511 make it crash with my files and tables.PrePARE
assumes some hard coded rules for finding CV, coordinates, grid tables, etc.. e.g., in https://github.com/PCMDI/cmor/blob/5f8759ea2f8cff4fa05657c10be62755b4c75611/LibCV/PrePARE/PrePARE.py#L256-L258 This could be easily solved probably using command line arguments.It seems that
PrePARE
is tightly coupled to CMIP6 vocabulary with_cmip6_option="CMIP6"
although i could make it run with CORDEX tables using a few hacks and i wanted to document some of the pitfalls here. This could probably be fixed quite easily using something like acmip6_option
as a command line argument forPrePARE
. However, it still feels not right...This is probably related to: