Cleaning up CMOR input file vs registered content sources

durack1 commented 3 years ago

@mauzey1 I have been trying to simplify my CMOR_input.json file along with the details (that are currently duplicated) in the registered content, found in the input4MIPs_CV.json file.

My aspiration, was that all information that is contained in the registered content (input4MIPs_CV.json) would be the primary source, and only additional/runtime info e.g.

Screen Shot 2021-09-02 at 1 28 28 PM link

Would be included in the CMOR_input.json file, whereas all registered content would be used by default, and if there was an additional definition of the variable with the same name in the CMOR_input.json then this would overwrite the input4MIPs_CV.json sourced entry.

Can you comment on this?

@taylor13 ping

mauzey1 commented 2 years ago

@durack1

So far, I have a branch in the repo that you can test. It currently fulfills the use of the attributes in the source_id entry of the CV to fill global attributes. It doesn't perform the part where it rejects values of attributes defined in the variable table if they don't match.

I would prefer that changes get reviewed in braches before I push them to the nightly build.

mauzey1 commented 2 years ago

@durack1

I've created a new testing label for builds that contain experimental features that need testing before being added to the nightly build. To create a conda environment for testing it with Python, run the following.

conda create -n cmor_testing -c pcmdi/label/testing -c conda-forge -c cdat/label/nightly -c cdat cmor cdms2 testsrunner

This build has the changes I made in the branch I mentioned previously.

durack1 commented 2 years ago

@mauzey1 perfect!

I assume the correct package is labelled (for py3.9) osx-64/cmor-3.6.1.2022.01.06.03.14.g3926ce5-py39h479cae4_0.tar.bz2? - great and that also lines up with the testing branch at https://github.com/PCMDI/cmor/tree/testing, so this should be something that we can easily keep trace of over new conda builds

mauzey1 commented 2 years ago

@durack1 Were you able to test this build of CMOR?

durack1 commented 2 years ago

@mauzey1 that's on the to-do list for this week, I was keen to get this data written and published pronto - will circle back when I have feedback for you

durack1 commented 2 years ago

@mauzey1 just confirming, I have just created a new env with

+ cmor 3.6.1.2022.01.06.04.48.g3926ce5 py310h88780f8_0 pcmdi/label/testing/osx-64 956 KB

I assume I'm playing with the right version?

mauzey1 commented 2 years ago

@durack1 Yes, that is the right version.

durack1 commented 2 years ago

@mauzey1, I am just working through my last test, which led to this issue. In the tables, we have mip_era defined, so for e.g. master: "mip_era":"CMIP6", issue87 branch: "mip_era":"CMIP6Plus",

Is it possible to remove mip_era from the table header? And provide this in the user input, or better, in the registered CVs? Or alternatively, could "mip_era":"CMIP6 CMIP6Plus", work? (indicating that either mip_era was an accepted value?

mauzey1 commented 2 years ago

@durack1

I did an experiment where I changed the mip_era for the table and user input to CMIP6 and CMIP5 respectively. If I run sanitizeTest.py with this configuration, I get the following message.

C Traceback:
In function: _CV_ValidateAttribute
! called from: _CV_checkGblAttributes
! called from: cmor_setGblAttr
! called from: cmor_write
! 

!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Warning: The registered CV attribute "mip_era" as defined as "CMIP6Plus" will be replaced with 
! "CMIP6" as defined in your user input file
! 
!
!!!!!!!!!!!!!!!!!!!!!!!!!

This will create a file with CMIP6 as its mip era.

If I were to remove the "mip_era":"CMIP6" line from the header of input4MIPs_Omon.json, then I get the following message.

C Traceback:
In function: _CV_ValidateAttribute
! called from: _CV_checkGblAttributes
! called from: cmor_setGblAttr
! called from: cmor_write
! 

!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Warning: The registered CV attribute "mip_era" as defined as "CMIP6Plus" will be replaced with 
! "CMIP5" as defined in your user input file
! 
!
!!!!!!!!!!!!!!!!!!!!!!!!!

This will create a file with CMIP5 as its mip era.

Removing "mip_era": "CMIP5" from the user input will make CMOR generate a file with CMIP6Plus as its mip era as defined in input4MIPs_CV.json.

It seems like CMOR is setting mip_era where the registered CV value is used by default but is overridden by the value defined in user input, which in turn is overridden by the table's value. Removing the table's mip_era would mean the CV's value would be used if mip_era is not defined in the user input.

mauzey1 commented 2 years ago

Or alternatively, could "mip_era":"CMIP6 CMIP6Plus", work? (indicating that either mip_era was an accepted value?

@durack1 You mean having having a list of valid values for mip_era for a source_id entry in the CV, or in a variable's table?

durack1 commented 2 years ago

@mauzey1 yes sorry to be more clear. I was hoping that we could generate tables that could be used across CMIP6 and CMIP6Plus, so not requiring to have two sets of identical tables for each mip_era. I did try adding both entries (space delimited) and that didn't seem to work

mauzey1 commented 2 years ago

@durack1 So would we put "mip_era":"CMIP6 CMIP6Plus" in the header of a table (Ex. input4MIPs_Omon.json) to indicate that it could be used to create files with either mip era? Would that mean that the mip era values listed in the user input and CV are what get used to create the file, and they must be one of the mip era values listed in the table's header?

durack1 commented 2 years ago

@mauzey1 exactly. We need to think a little about this, as it's the first time (I think) that we'd try to reuse tables to cover two mip_era entries. @taylor13 and I will have to put our heads together and think this through a little.

@matthew-mizielinski pinging you on this thread too

taylor13 commented 2 years ago

My current view is that "mip_era" is not a particularly good descriptor for distinguishing one dataset from another. For experiments like abrupt4xCO2, the experiment design is not expected to change significantly over time, so classification by "era" seems pointless. (Note that at any point in time a very old model or a recently developed model might run this experiment, so era isn't a good indication of the "model's newness" either.

A much more important and fundamental distinction between abrupt4xCO2 model output generated for CMIP5 and model output generated for CMIP6 is that the DRS is different. I therefore suggest that in place of labeling output with "mip_era", we explicitly interpret "mip_era" as identifying the DRS that a dataset is compliant with. (I would also suggest that in the future, we rename this descriptor "DRS_rules" (which would be set to "mip_era" (i.e., "CMIP6") in the current model archive. (For input4MIPs, we would define "DRS_rules"="input4MIPs6", or something else appropriate and different from "CMIP6"); for "obs4MIPs", "DRS_rules"="ODS2.1"; and for CMIP5, DRS_rules="CMIP5").

durack1 commented 2 years ago

@mauzey1 apologies for taking my time on this, I am now starting to work on generating this new data so am following up where I left off and will use this issue to drop down a couple of notes before trying to new testing pre-release versions.

When I remove "source" from the cmor_input.json file I get:


!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Error: Your Control Vocabulary file specifies one or more
! required attributes.  The following
! attribute was not properly set.
! 
! Please set attribute: "source" in your input file.
!
!!!!!!!!!!!!!!!!!!!!!!!!!

Even though this is defined in the registered content, identified by the source_id label in the cmor_input.json file

mauzey1 commented 2 years ago

@durack1 That error should not be appearing the testing build.

durack1 commented 2 years ago

@mauzey1 I was hoping this was the case, that's next on the to-do list, the v1.1.7 data (rather than a reformatted v1.2.0) was published today, thanks @sashakames

durack1 commented 2 years ago

@mauzey1 first up apologies for taking this long to get back to this.

Ok, I have now pulled down CMOR 3.7.0 pre-release/testing (cmor 3.6.1.2022.02.25.17.07.g963fd11 py310h8466d85_0 pcmdi/label/testing/linux-64) and have updated my input deck to reflect the changes that I had proposed, which I will describe again below

I have updated the PCMDI/input4mips-cmor-tables/input4MIPs_source_id.json so that all the required global attributes are in the PCMDI-AMIP-1-2-0 registration
I have removed these entries from the user_input.json file (in an attempt to depend wholly on the registered content), which can be found at PCMDI/amipbcs/CMOR/drive_input4MIPs_bcs.json, the diff can be seen on the PR here

This now gives an error, the region variable is defined in the PCMDI-AMIP-1-2-0 source_id registration, not the user_input.json, so it should be available for CMOR to use, but bombs:

C Traceback:
! In function: _CV_checkGblAttributes
! called from: cmor_setGblAttr
! called from: cmor_write
! 

!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Error: Your Control Vocabulary file specifies one or more
! required attributes.  The following
! attribute was not properly set.
! 
! Please set attribute: "region" in your input file.
!
!!!!!!!!!!!!!!!!!!!!!!!!!

C Traceback:
! In function: cmor_get_cur_dataset_attribute
! called from: _CV_ValidateAttribute
! called from: _CV_checkGblAttributes
! called from: cmor_setGblAttr
! called from: cmor_write
! 

!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Error: Dataset: current dataset does not have attribute : region
!
!!!!!!!!!!!!!!!!!!!!!!!!!

C Traceback:
In function: _CV_ValidateAttribute
! called from: _CV_checkGblAttributes
! called from: cmor_setGblAttr
! called from: cmor_write
! 

!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Warning: The registered CV attribute "mip_era" as defined as "CMIP6Plus" will be replaced with 
! "CMIP6" as defined in your user input file
!
!!!!!!!!!!!!!!!!!!!!!!!!!

C Traceback:
! In function: _CV_checkGblAttributes
! called from: cmor_setGblAttr
! called from: cmor_write
! 

!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Error: Please fix required attributes mentioned in
! the warnings/error above and rerun. (aborting!)
!
!!!!!!!!!!!!!!!!!!!!!!!!!

C Traceback:
! In function: 

!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Error: Cannot continue until you fix the errors listed above: -2
!
!!!!!!!!!!!!!!!!!!!!!!!!!

I do note that this variable is defined as a json list type, not a string as such registrations can have more than one entry

A separate point, the error "Warning: The registered CV attribute "mip_era" as defined as "CMIP6Plus" will be replaced with "CMIP6" as defined in your user input file" is a little off, as this information is not provided in the user input file but is rather included in both the registered content/source_id input, plus the entry in the input4MIPs_SImon/Omon.json files in the "mip_era":"CMIP6", header entry

mauzey1 commented 2 years ago

@durack1

I have fixed the warning message for values from the tables. I have made a change that will detect if the value of an attribute comes from a table rather than the user input.

C Traceback:
In function: _CV_checkGblAttributes
! called from: cmor_setGblAttr
! called from: cmor_write
! 

!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Warning: The registered CV attribute "mip_era" as defined as "CMIP6Plus" will be replaced with 
! "CMIP6" as defined in the table input4MIPs_Omon
! 
!
!!!!!!!!!!!!!!!!!!!!!!!!!

I will push the change to the testing branch now.

durack1 commented 2 years ago

@mauzey1 great, I am setup to update the conda env, so will pull down the latest when it lands - I presume this will be today?

Is the current code able to deal with the format of the region attribute as defined in the current branch? see here

mauzey1 commented 2 years ago

@durack1 I would expect the build to be available in about 1-2 hours from now.

I haven't addressed the issue with value list attributes like region, yet. How would you like CMOR to handle these kinds of attributes? Would having a comma-separated list of values in a string be a suitable value for this attribute? Maybe just using the first attribute that appears on the list? Are there examples in datasets where multiple string values are held in a single variable? Having such list of values for an attribute would mean that PrePARE will need to properly identify such values.

durack1 commented 2 years ago

@mauzey1 yes, I think a comma-separated string would be perfect. We have a couple of obs4MIPs datasets (see here) that have multiple entries using the same list type, so managing that across all inputs would be useful (if it's not too hard - happy to iterate over this)

While we're cleaning things up, would it be possible to catch unsupported types. So for e.g. if a user attempts to pass CMOR an input file with a dict - what happens in that case now?

durack1 commented 2 years ago

@sashakames just wondering how the ESGF publisher would deal with a comma-separated global attribute, are there any publisher-specific formats we need to ensure?

mauzey1 commented 2 years ago

Since multiple values seem to be rare for these kind of attributes, maybe we should consider just using the only value present when it is just one in the list. For multiple values in a list, we could give an error that warns the user that they need to choose one.

An example in the CMIP6 tables is institution_id. This attribute in the registered data is in bracketed list form but most of them only have one value listed. However, there are several entries that have multiple values like this. If the institution id is not set, then CMOR could have an error message like the following:

!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Error: The registered CV attribute "institution_id" has multiple values defined in CMIP6_CV.json.
! Please use one of the following values for this attribute.
!            * UCI
!            * NCAR
! 
!
!!!!!!!!!!!!!!!!!!!!!!!!!

The message could also simply ask the user to refer to the CV file for the valid values.

The attributes cohort and activity_participation are also list with activity_participation having many entries listed, but these attributes are not listed in required_global_attributes so CMOR doesn't consider them.

durack1 commented 2 years ago

@mauzey1 exactly, thanks for bringing up these examples. So for the CMIP6 project, we'd need a user to provide the following (in the case that multiple/list values are found in the registered content):

activity_participation
institution_id
region

For input4MIPs the only case where multiple registered values exist are:

region
source_variables

For obs4MIPs same:

sashakames commented 2 years ago

@durack1 We are using a space as the delimiter now for CMIP6 at least from my read of the code. For old publisher users, the config can be updated, but we have been using an official config and this would mean two versions potentially. As the new publisher is in beta, I have some flexibility to change how we configure the project so users would be able to gradually migrate to such a change if you decide to go ahead.

sashakames commented 2 years ago

Some more background: for input4MIPs we were going to use commas. Instead we just get the values out of the .json file so it doesn't read the global attributes to generate additional facet metadata (beyond those in the DRS)

mauzey1 commented 2 years ago

@sashakames So there can be cases where a dataset will have attributes with multiple values? Something like "institution_id": "UCI NCAR" is possible?

sashakames commented 2 years ago

@mauzey1 Yes indeed. I think activity_id and realm were the CMIP6 examples (not institution_id though to my knowledge) Sorry I don't have an example but there are some out there.

durack1 commented 2 years ago

@sashakames you're right, those two as well. The example of the activity_id is already caught in the source_id file, but realm, absolutely, there are a couple of limited example that noted both seaIce/ocean realms, and maybe a couple more - I could dig them up if useful @mauzey1

matthew-mizielinski commented 2 years ago

A word of caution on allowing multiple institution_ids for a single data set.

I would use the institution id to work out who to contact if there is an issue, i.e. who is responsible for the data. If two institutions are specified then this becomes more tricky, and it can complicate the DRS*. I know we have allowed multiple activity_ids in CMIP6 (e.g. ssp370), but this has lead to confusion and argument, and is something I would like to see avoided in the future if possible.

*Early in CMIP6 I did publish some data under AerChemMIP for which the primary activity id was RFMIP. I think the search interface handles this fine, but the location of data on disk can catch users out.

sashakames commented 2 years ago

Thanks Matt, something prefixed like: CMIP6.RFMIP.MOHC.HadGEM3-GC31-LL.piClim-aer... appear to match if we need an example of how this is done now.

Is there a practical reason to allow multiple institutions for a dataset, such as a collaboration produced a model and important to represent each? We have used "consortia" ids such as "EC-Earth-Consortium" and E3SM-Project.
Not to confuse this for permitting different possible institutions for a single source_id and that makes sense but its separate datasets so just a single institution.

durack1 commented 2 years ago

@matthew-mizielinski thanks for clarifying my post. @sashakames this comes back to the registered info (CMIP6_CVs), and then the info required by CMOR to write any one dataset.

For the registered info, we will need to allow a single source_id to have multiple institution_id entries. As is the case of UKESM1-0-LL this single source_id is being used by 4 registered institution_id's, and each of these contributed datasets should use the appropriate institution_id, contact etc to ensure that the right folks are being contacted if data queries arise.

Thinking a little more about this, maybe it would be a good idea to check the output_path_template field that is required by CMOR, and ensure that any output/DRS or filename components only have a single value. If multiple values are registered in the CMIP6_CVs, then we will stop throwing an error that the CMOR_user_input.json file requires a single value to be defined

mauzey1 commented 2 years ago

@durack1 If an attribute has multiple values separated by a space, then CMOR will only use the first value in that list for the output path. https://github.com/PCMDI/cmor/blob/90bde9eae22c893cf2c010c716ba5934d8d9005e/Src/cmor.c#L5561-L5572

taylor13 commented 2 years ago

I haven't followed all of the above, but I think we should be clear about the various cases that were needed for CMIP6:

1) an attribute invariably was assigned a single particular value: For example, mip_era = "CMIP6" 2) an attribute was assigned a single value drawn from a CV: For example, source_id = "UKESM1-0-LL", where only registered models are allowed. 3) an attribute was assigned a particular list of values, as specified in a registry (nb. sometimes the "list" is a single item): For example, realm = "atmos" for many variables or realm = "landice land" for a few variables including surface snow amount where land. 4) an attribute could be assigned a value or multiple values found in a CV (with no restriction on what could be included): For example, source_type = "AGCM CHEM AER".

For CMIP6, if an attribute had multiple values and that attribute was used as part of the DRS to uniquely identify datasets (e.g., via paths and file names), the first value was extracted for that purpose. For the next phase of CMIP, I think that for these cases, we should define separate attributes, one for the DRS (containing a single value), and one for informational and search purposes (containing the DRS value plus other appropriate values in a space-separated string).

mauzey1 commented 2 years ago

Following up on this issue, I have made changes to CMOR that would allow it to populate attributes with multiple values if they are in a list in the CV. However, I have tried running PrePARE on files where this change is applied and it treats these values as an error. For example, a file with institution_id = "MOHC NERC NIMS-KMA NIWA" gives the following output.

C Traceback:
! In function: _CV_setInstitution
! 

!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Error: The institution_id, "MOHC NERC NIMS-KMA NIWA",  found in your 
! input file () could not be found in 
! your Controlled Vocabulary file. (Tables/CMIP6_CV.json) 
! 
! Please correct your input file or contact "cmor@listserv.llnl.gov" to register
! a new institution_id.  
! 
! See "http://cmor.llnl.gov/mydoc_cmor3_CV/" for further information about
! the "institution_id" and "institution" global attributes.  
!
!!!!!!!!!!!!!!!!!!!!!!!!!

└──> :: CV FAIL    :: /Users/mauzey1/Desktop/github/cmor/fDeforestToProduct_Emon_UKESM1-0-LL_1pctCO2_r2i1p1f2_gn_195001-199912.nc

Number of files scanned: 1
Number of file with error(s): 1

Attributes like source_type have been programmed to handle having multiple values listed, independent of order listed. Do we want to apply similar behavior to attributes such as institution_id?

durack1 commented 2 years ago

Thanks @mauzey1!

Following on from what @taylor13 was noting, we have 4 classes of attributes

an attribute assigned a single value, unchecked For example, mip_era = "CMIP6"
an attribute assigned a single value from a CV, checked For example, source_id = "UKESM1-0-LL", where only registered models are allowed.
an attribute assigned a list of values, specified in a registry/CV/CMOR inputs, table header etc (nb. sometimes the "list" is a single item), checked For example, realm = "atmos" for many variables or realm = "landice land" for a few variables including surface snow amount where land.
an attribute assigned a value or multiple values found in a CV (with no restriction on what could be included), unchecked For example, source_type = "AGCM CHEM AER".

The values defined in the CMIP6_CVs/CMIP6_required_global_attributes.json file can be broken down into the 4 subgroups above:

1 single value, no registry/CVs, unchecked

        "Conventions",
        "creation_date",
        "forcing_index",
        "further_info_url",  # Recommended, provided by ES-Docs
        "grid",
        "initialization_index",
        "license",  # Recommended
        "physics_index",
        "product",
        "realization_index",
        "tracking_id",  # CMOR written
        "variant_label"

2 single value, registry/CVs, checked

        "activity_id",  #CVs
        "data_specs_version",  # table header
        "experiment",  #CVs
        "experiment_id",  #CVs
        "frequency",  #CVs
        "grid_label",  #CVs
        "institution",  #CVs
        "institution_id",  #CVs
        "mip_era",  #CVs
        "nominal_resolution",  #CVs
        "source",  #CVs
        "source_id",  #CVs
        "sub_experiment",  #CVs
        "sub_experiment_id",  #CVs
        "table_id",  # table header
        "variable_id",  # table variable

3 multiple values, registry/CVs, checked

        "realm",  #CVs

3 multiple values, registry/CVs, unchecked

        "source_type",

To implement the above, we would need to augment the structure of the CMIP6_required_global_attributes.json to provide this information programmatically. We would also have to generate a similar format in the PCMDI/input4MIPs-cmor-tables/Tables/input4MIPs_CV.json and PCMDI/obs4MIPs-cmor-tables/Tables/obs4MIPs_CV.json files too

@taylor13 @matthew-mizielinski does the above seem correct to you?

durack1 commented 2 years ago

And @mauzey1 to make sense of the above, the group 2 entries are defined to a single (registered) entry, to ensure that for example in the case that a query is raised against a model that is registered against multiple institution_id entries, it is obviously where queries should be directed (to the single institution_id listed in the file, and noted in the ESGF publication database)

matthew-mizielinski commented 2 years ago

Apologies for missing this.

@taylor13 @matthew-mizielinski does the above seem correct to you?

I think grid is a free-form text field, we use it to add a string describing the grid in use, I've put a selection of the ones we have used in CMIP6 from UKESM1-0-LL and HadGEM3-GC31-LL (similar for HadGEM3-GC31-MM, but with more points)

grid = Native N96 grid (U points); 192 x 144 longitude/latitude
grid = Native N96 grid (UV points); 192 x 145 longitude/latitude
grid = Native N96 grid (UV points); Zonal mean, 145 latitude
grid = Native N96 grid (V points); 192 x 145 longitude/latitude
grid = Native N96 grid; 192 x 144 longitude/latitude
grid = Native N96 grid; Global mean
grid = Native N96 grid; Zonal mean, 144 latitude
grid = Native eORCA1 tripolar primarily 1 deg with meridional refinement down to 1/3 degree in the tropics; 360 x 330 longitude/latitude
grid = Native eORCA1 tripolar primarily 1 deg with meridional refinement down to 1/3 degree in the tropics; Global mean
grid = Native eORCA1 tripolar primarily 1 deg with meridional refinement down to 1/3 degree in the tropics; Quasi-zonal mean, 330 latitude

durack1 commented 2 years ago

@matthew-mizielinski thanks, I have corrected the above https://github.com/PCMDI/cmor/issues/628#issuecomment-1085241113. You're right, grid is a freeform field, grid_label is the CV-controlled attribute

mauzey1 commented 2 years ago

@durack1 So we want to add information to the CV file that says which attribute can have multiple values listed? This will require changes to CMOR to read but the CV file should still be usable by older CMOR versions, which should ignore this new data.

realm and source_typearen't listed in the registered entries but institution_id and region are. If we encountered attributes in registered entries with bracketed list that contain multiple entries such as this, should we report an error if this attribute is missing from the dataset rather than trying to fill it with the value list? Maybe display an error message similar to the one below.

!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Error: The registered CV attribute "institution_id" has multiple values defined in CMIP6_CV.json.
! Please use one of the following values for this attribute.
!            * UCI
!            * NCAR
! 
!
!!!!!!!!!!!!!!!!!!!!!!!!!

If there is only one value listed in the bracketed list, then maybe we could use that one value for the dataset just like the other single-value attributes.

durack1 commented 2 years ago

@mauzey1 exactly (as you describe https://github.com/PCMDI/cmor/issues/628#issuecomment-1092279751), what would be the best way to do this, add an entry to the CMIP6_CV.json?

mauzey1 commented 2 years ago

@durack1 Maybe have a section next to required_global_attributes called multiple_value_attributes listing the attributes that can have a list of values.

As for the values that aren't listed as having multiple values but have multiple values in the registered content, should I create the kind of error that I described in https://github.com/PCMDI/cmor/issues/628#issuecomment-1092279751? Otherwise, CMOR will just use the missing attribute error that it currently uses.

mauzey1 commented 2 years ago

@durack1

I have made changes similar to what I have described in https://github.com/PCMDI/cmor/issues/628#issuecomment-1092279751 where single-value bracketed values will be treated as single values for attributes, but multiple values will raise an error asking the user to select a value. It will only use these values if their attribute is listed in required_global_attributes.

Here's an example of an error message.

C Traceback:
! In function: _CV_checkGblAttributes
! called from: cmor_setGblAttr
! called from: cmor_write
! 

!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Error: The registered CV attribute "source_variables" has multiple values 
! defined in "./input4MIPs-cmor-tables/Tables/input4MIPs_CV.json"
! Please select one from the entry source_id.PCMDI-AMIP-1-2-0.source_variables.
!
!!!!!!!!!!!!!!!!!!!!!!!!!

source_variables is not in required_global_attributes, I'm just using it as an example

I will push these changes into the testing branch.

durack1 commented 2 years ago

@mauzey1 great, thanks for pushing on this!

durack1 commented 2 years ago

@mauzey1 apologies for the delay in getting back to this. Now we also have #656 created, is it just this and the license template to be finalized before we can get 3.7.0 released and tagged?

I have to generate an updated PCMDI-AMIP dataset update soon, so would ideally like to do this with CMOR 3.7.0 if that was possible in the coming week(s)?

@matthew-mizielinski @taylor13 ping

mauzey1 commented 2 years ago

@durack1 The initial features that we wanted in this issue should be implemented, and the license templates have been updated in CMIP6_CV.json without needing to change anything in CMOR aside some testing input. These changes are all in the testing branch and should be accessible as a nightly build from Conda using the following command.

conda create -n [YOUR_ENV_NAME_HERE] -c pcmdi/label/testing -c conda-forge cmor`

Unless you want CMOR to automate adding the license similar to how it adds global attribute values from the registered content, then I think this task should be completed. Please review the testing build, and review the pull request.

durack1 commented 2 years ago

@mauzey1 just circling around on this - the new build cmor 3.6.1.2022.08.02.00.28.g4871d9c py310h4b0e41b_0 PCMDI/label/nightly/linux-64 is looking great, but I have one last tweak that I am trying to ascertain.

When I declare a new license string, I am getting hit with the error:

C Traceback:
In function: _CV_checkGblAttributes
! called from: cmor_setGblAttr
! called from: cmor_write
! 

!!!!!!!!!!!!!!!!!!!!!!!!!
!
! Warning: The registered CV attribute "license" as defined as "AMIP boundary condition data produced by
PCMDI is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0;
https://creativecommons.org/licenses/by/4.0). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms
of use governing input4MIPs output, including citation requirements and proper acknowledgment. Further
information about this data, including some limitations, can be found via the further_info_url (recorded as
a global attribute in this file). The data producers and data providers make no warranty, either express or
implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose.
All liabilities arising from the supply of the information (including any liability arising in negligence) are
excluded to the fullest extent permitted by law" will be replaced with 
! "AMIP boundary condition data produced by PCMDI is licensed under a Creative Commons Attribution-[NonCommerci
!
!!!!!!!!!!!!!!!!!!!!!!!!!

The Creative Commons Attribution-[NonCommerci.. is not in any of my input files (that I can find), is this a hard-coded error?

mauzey1 commented 2 years ago

@durack1 Which CV file are you using? What does your user input file look like?

durack1 commented 2 years ago

@mauzey1 this turned out to be an issue with one of two user_input.json files, in which I got a little confused which one was being used - user error, sorry!

Good news, I have validated that the slimmed down user_input.json files found at amipbcs user_input files write the output as expected with the nightly/test build - this outstanding issue is closed - on with the 3.7.0 release!~!

PCMDI / cmor

Cleaning up CMOR input file vs registered content sources #628