WCRP-CMIP / CMIP6_CVs

Controlled Vocabularies (CVs) for use in CMIP6
Creative Commons Attribution 4.0 International
150 stars 76 forks source link

Coordinate with ES-DOC model and experiment CVs #48

Closed eguil closed 7 years ago

eguil commented 7 years ago

Coordination is needed to avoid asking the same information twice to modelling groups.

durack1 commented 7 years ago

@eguil are there any amendments to the standing source_id (model) template that should be augmented so we have information in one place - take a look here

The experiment_id template should also be reviewed - take a look here or for a web-based tabulated version, here

Please add any other ES-DOC contributors that you think should be involved in discussing this issue using their github @handle

@taylor13 pinging you here

markelkington commented 7 years ago

I can see problems with the institute_id in the model description. Is this a list of institutes that contributed to the model (in which case we are going to need to expand the CMIP6 institute codes massively), or is it the institute that is running the model,or the institute funding the model development, or the institute that will be the point of contact for questions about the model.

I had understood that the citation system would handle this complexity.

In my view we should not have hierarchical dependent CVs. There is a model (source), an institute and they are separate entities which are joined together when describing a data set. So UKMO uses HadGEM3-GC3 to produce a simulation for the SSP2.4 experiment

asladeofgreen commented 7 years ago

Re source_id: A homepage attribute might be useful;

Re experiment_id: Some info is not yet incorporated into the ES-DOC viewer - I will rectify this.

markelkington commented 7 years ago

The sub-model details should follow the realm classification. It is mostly OK - but surely "glacier" should be "land_ice" to be consistent with other parts of the CMIP6 infrastructure. What about "aerosols"?

markelkington commented 7 years ago

The model component codes used in the experiment_id template does not match the codes used in ES-DOC. Some codes used are not even component types - AOGCM is not a component type, its a model type ... and at the moment I don't think it is in the list of model types in ES-DOC.

If these templates go out to end users it is going to unnecessarily confusing for modelling groups having to use two different conventions for providing metadata for CMIP6. Can't be that hard to align these two sets of codes.

Apologies for being a bit tetchy about this - but I have been banging on about it for some time; and I'm one of the people who has to deal with the confusion it creates at a modelling centre. To give you just one example - how internal metadata systems will need to maintain multiple enumeration lists for the same thing and know where to apply each list ;-(

MartinaSt commented 7 years ago

Regarding: institute_id in the model description. The citation currently uses the CV to provide the entries for the citation GUI. I interpreted the institute_id in the source_id CV as institutes running a model. It is still not clear to me, where I will get access to the connections between which institute wants to run which models and which MIPs/experiments. I can get along without the connection to the experiments, though it would be more complicated but the connection between institute and model is essential as it is related to data ownership/responsible institution for data and data citation.

markelkington commented 7 years ago

Martina

I agree – model, institute and experiment are separate things – when they are all brought together then we have the core information for a data citation. If the model and institute are related then we possibly have a model citation. I don’t understand why institute is recorded in the model CV.

In out case it would need to deal with two institutes responsible for funding its core development, multiple institutes using it for running simulations, multiple institutes funding local extensions and multiple institutions contributing to the science.

Mark

From: Martina Stockhause [mailto:notifications@github.com] Sent: 08 September 2016 09:53 To: WCRP-CMIP/CMIP6_CVs Cc: Elkington, Mark; Comment Subject: Re: [WCRP-CMIP/CMIP6_CVs] Coordinate with ES-DOC model and experiment CVs (#48)

Regarding: institute_id in the model description. The citation currently uses the CV to provide the entries for the citation GUI. I interpreted the institute_id in the source_id CV as institutes running a model. It is still not clear to me, where I will get access to the connections between which institute wants to run which models and which MIPs/experiments. I can get along without the connection to the experiments, though it would be more complicated but the connection between institute and model is essential as it is related to data ownership/responsible institution for data and data citation.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/WCRP-CMIP/CMIP6_CVs/issues/48#issuecomment-245534764, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFTIJw2lt06yTk-uoMeK30vGOwyu3R1Qks5qn8zogaJpZM4J3SJM.

eguil commented 7 years ago

I agree that the CMIP6 CV that Karl and Paul are collecting from the modelling groups should only be about the model (_sourceid) and the institute, i.e. _insitutionid (running the model, i.e providing the data). This means only pairs of CV: "HadGEM3, UKMO" have to be provided by groups. The _experimentid is indeed suggested by the MIP chairs (and approved by the CMIP panel/WIP) and the model detail collected via ES-DOC. See my comments inline in https://docs.google.com/document/d/1HyKbkftWPnGkSPZC6I6nd58dRs2wxnyyJnIQFouVYRk/edit?ts=57d04716

MartinaSt commented 7 years ago

Just one more thought on the CVs (btw. I do not have access to the google doc Eric cited): We need to make clear, what the CVs are for. In my opinion, their core function is to be a reference for DRS names for the CMIP6 infrastructure components including QC.

As modeling centers should register their contribution for CMIP6, it would be possible to capture the connections among institute_id, source_id and experiment_id, which would be extremely helpful at least for citation.

However, I do not think that it is a good idea to collect too many details about the model (DRS component source_id) within the CV. These details are collected by CIM/ES-DOC. In my view the CV is the reference and ES-DOC needs to synchronize its model information with that provided in the CV for source_id.

eguil commented 7 years ago

Hi Martina, I fully agree. The connection between the experiment description document (CIM) and the model description documents (CIM) will be made at the ESGF data publication stage (which will harvest institute_id, source_id and experiment_id from the global attributes) via the ES-DOC automated scripting currently tested.

taylor13 commented 7 years ago

To clarify, what we (Paul and I) aim to achieve with the CMIP6_CVs is to:

  1. Provide lists of allowed text that may be assigned to certain global attributes in CMIP6 files (and subsequently may populate the DRS elements and appear in file names, directory structures, search facets, various ESGF catalogs, and the like). The global attributes provide, for example, rudimentary identifying information about the model, the experiment performed, and the simulation conditions.
  2. Specify certain relationships that constrain and restrict which attribute values are consistent with other attribute values (e.g., which institutions have been registered to run a given model). This information enables us to write QC software (such as the CMOR checker which is being used as part of the ESGF publication process) that will catch inconsistencies in the specification of global attributes (and file names, DRS, etc.).

The aim is not to provide comprehensive documentation of either the experiments or the models - this is the clearly defined role of ES-DOC.

We agree that a procedure should be established to ensure that the information contained in the CMIP6_CVs *.json files propagates to all the other software supporting CMIP6, and that this be made clear before contacting the modeling centers requesting that they “register” key information about their model(s) and institution. The CV information should definetly be synchronized between ES-DOCs and CMIP6_CVs. @markelkington we completely agree with you that correct information should be obtained once, be consistent, and be reused.

Here’s how we expect the CVs to be used:

  1. Modeling groups register their institution and model information by submitting an “issue” at the CMIP6_CVs repo: https://github.com/WCRP-CMIP/CMIP6_CVs/issues/new?title=CV%20OF%20INTEREST%20AND%20BRIEF%20INFO%20%28replace%20with%20your%20title%29 . A template is provided indicating what information is needed to define the values associated with the “keys” for institutions/institution_id and models/source_id (and some identifying features as noted in the ACCESS-1-0 example).
  2. CMOR3 harvests information in the CVs and creates CMOR tables and templates for file names and directories based on the CVs, which are needed when modeling groups use CMOR3 to rewrite their model output. CMOR also checks the metadata supplied by modeling groups for consistency with the CVs.
  3. Modeling groups not relying on CMOR3 to rewrite data, can consult the CVs to determine what values can be assigned to global attributes in their netCDF files.
  4. During data publication, ESGF uses the information in the CVs to check that key metadata is consistent with CMIP6 specifications.
  5. ES-DOC obtains from the CMIP6_CVs repository rudimentary “registration” information about models, institutions and experiments found in the CVs and creates “stub” documentation in its own ES-DOC form. Then it requests that modeling groups provide all the additional documentation useful to analysts. A modeling group must register its model and institution with CMIP6_CVs before providing information to ES-DOC.
  6. ES-DOC harvests information from global_attributes and DRS structures on the ESGF [Got this from Eric’s comment. Is this really true??]

We note that simple CVs (i.e., simple lists of acceptable text strings) cannot capture all the information needed to meet the needs of above (most notably the QC implementation needed for steps 2 and 4). For QC we need to specify the allowed combinations of CV values. This is why the CVs in this repo have been structured the way they are.

We would also note that we expect ES-DOC will continually update its own data base (sim documents?) by ingesting new models and institutions as they are registered. “Stub” “landing pages” could then be immediately generated for each new simulation, providing a “live” target pointed to by ESGF publication data sets (via the further_info_url). Initially, of course, the landing page might only repeat model, institution, and experiment information already recorded in the netCDF output files themselves, but the absence of more complete documentation (requested by ES-DOC) would be obvious and might prompt modelling groups to supply the additional details needed in a more timely manner than in previous CMIP phases.

In general, the virtue of having modeling groups provide input via github is that it is clearly visible, transparent and easily updated, or corrected if necessary.

taylor13 commented 7 years ago

It might be useful to include here an email I sent to the WIP members, Charlotte Pascoe, and a few others on 7/15/16: This email provides a little more background and description of the CVs hosted on this github repo:

Dear all,

Paul Durack and I have now created JSON files defining controlled vocabularies (CVs) that are essential to ESGF, CMOR, the data request, and ESDOC. These files are located on github (https://github.com/WCRP-CMIP/CMIP6_CVs) as called for by one of our draft position papers ( https://docs.google.com/document/d/1CzTUoX4H2S0XbQUM3_9yKvJ2la7qUExFV7ibGzThmhA/edit ). These are not finalized, but they serve as the "reference" for certain CMIP6 CVs.

I would note, that Martin is responsible for the variable CV information, and I think a subset of the information contained in the variable request (see the document referenced below) should also be made available on the github site (namely the information essential to ESGF).

The CVs stored here will not comprehensively meet the needs of CMIP6, but they provide the foundation for:

DRS
ESGF
the data request
ESDOC
CMOR and the CMIP6 validator

Some of these CVs are pretty well agreed upon, while others might still evolve. It might be appropriate to include additional CVs on this github repository. Please advise.

I've also prepared a document that describes the vocabularies (https://docs.google.com/document/d/1N0pLdUA7_lgmK93MIQtdSeelHWPodJYOcWhDFDHiQ90), and I have shared it so you can comment and suggest changes.

Note that the CVs don't provide _everything_ one might want to know. Additional descriptions, specifications, relationships, documentation, etc. will be recorded and made available from:

the data request
ESDOC
CMOR tables

Also the CMOR tables, ESDOC, and the data request database developed by Martin will duplicate some of the CV information stored on the github site, but those will be derivative and not the reference. It will be important that all three of those remain synchronized with the JSON files on github.

Please look over both the CVs on github and the "CV_responsibilities" document on google docs (url's for both given above), and provide comments (registered as issues on the github or as comments/suggestions on google docs, as appropriate).

There are known inaccuracies with all the CVs currently on github, so some of your suggestions we may already be aware of. Please be patient.

If you reply to this email (as opposed to submitting issues), please "reply all".

thanks, Karl

taylor13 commented 7 years ago

@markelkington we agree that "land_ice" should replace "glacier". In general if ES-DOC has established a CV for the component models (including aerosols?), we should adopt it. Can someone provide the list of components as a separate issue on this repo (include "source_id" as the first word of your title)?

Also, concerning the difference between "source_type" (a separate CV) and the list of possible component models included in the source_id CV. The WIP agreed some time ago that source_id would not change even if the "components" comprising it were turned on or off. For example, a coupled model run in AMIP mode (i.e., only the atmospheric and land components active), would have the same name as when it was run in coupled mode (as an AOGCM). The experiment of course implies a certain model configuration, but there remains some flexibility (e.g., running with atmospheric chemistry on or off). If two versions of a model were to run the same experiment, then we distinguish between them using the "p" index of the "ripf" indicator. Also, the global attribute "source_type" would be different for the two runs. The options proposed for "source_type" are listed in note 13 after table 1 in the WIP's global attribute document: https://docs.google.com/document/d/1h0r8RZr_f3-8egBMMh7aqLwy3snpD6_MrDz1q8n5XUk/edit#

If the list of source_types is inconsistent with ES-DOCs, please raise an issue pointing this out. Note that "source_type" is not meant to be a simple list of component models. Rather it distinguishes among different categories of models, which will make it more useful in ESGF searches. If anyone can think of a much better way of doing this, please propose it immediately. We want to finalize the global attributes document this weekend.

The list of component models appearing in the source_id should be comprehensive in the sense that if in any CMIP6 experiment a component is included, then it should be listed even if for other experiments the component is inactive (e.g., both the atmospheric and ocean components should be specified if the model runs DECK experiments even though in the AMIP run, the ocean is turned off).

taylor13 commented 7 years ago

@markelkington Just to confirm what others have suggested: For QC it is important to check whether the institution/source (i.e., model) pairs have been registered. In the source_id CV we therefore list the institution_ids of all institutions who have indicated they plan to contribute CMIP6 simulations generated by a given model. In most cases there will be only a single institution_id listed. Note that there is a separate institution_id CV where the full name and address associated with each institution_id are provided.

Similarly, in the experiment_id CV we plan to modify the structure slightly as discussed in https://github.com/WCRP-CMIP/CMIP6_CVs/issues/1 . The plan is to remove "sub_experiment" from this CV, and only include the list of possible "sub_experiment_ids". Then we will create a new CV called "CMIP6_sub_experiment_id.json" which will be a dictionary with "sub_experiment_id" as the key and "sub_experiment" the value associated with each key.

markelkington commented 7 years ago

Hi Karl

Re: land-ice. The CMOR and ES-DOC lists are subtly different. ES-DOC references scientific realms (atmosphere, aerosols, land, land-ice, etc.). Component is usually used to refer to a physical model component – which may implement one or more realms e.g. nemo. It will be possible to use one CV for both CMOR and ESDOC as long as we agree that we agree the component in CMOR is equivalent to scientific realm.

Re: source-type – do you want me to raise the issue in github for CMOR. I raised it in the ES-DOC repository some months ago. I don’t really mind which list is used (or even a combination of the list values) as long as there is just one list and it is agreed who maintains the content of that list. [Regarding the issue of using AOGCM as the value when we are running in atmosphere only mode – that seems OK to me]

Re: source-id – my interpretation of your explanation is that we will have one component list for each model – and it will be the full set of components that we use in the model even if some are turned off for a particular MIP/experiment (and the source-type represent this full configuration model). Is that correct?

Regards

Mark

From: taylor13 [mailto:notifications@github.com] Sent: 09 September 2016 16:22 To: WCRP-CMIP/CMIP6_CVs Cc: Elkington, Mark; Mention Subject: Re: [WCRP-CMIP/CMIP6_CVs] Coordinate with ES-DOC model and experiment CVs (#48)

@markelkingtonhttps://github.com/markelkington we agree that "land_ice" should replace "glacier". In general if ES-DOC has established a CV for the component models (including aerosols?), we should adopt it. Can someone provide the list of components as a separate issue on this repo (include "source_id" as the first word of your title)?

Also, concerning the difference between "source_type" (a separate CV) and the list of possible component models included in the source_id CV. The WIP agreed some time ago that source_id would not change even if the "components" comprising it were turned on or off. For example, a coupled model run in AMIP mode (i.e., only the atmospheric and land components active), would have the same name as when it was run in coupled mode (as an AOGCM). The experiment of course implies a certain model configuration, but there remains some flexibility (e.g., running with atmospheric chemistry on or off). If two versions of a model were to run the same experiment, then we distinguish between them using the "p" index of the "ripf" indicator. Also, the global attribute "source_type" would be different for the two runs. The options proposed for "source_type" are listed in note 13 after table 1 in the WIP's global attribute document: https://docs.google.com/document/d/1h0r8RZr_f3-8egBMMh7aqLwy3snpD6_MrDz1q8n5XUk/edit#https://docs.google.com/document/d/1h0r8RZr_f3-8egBMMh7aqLwy3snpD6_MrDz1q8n5XUk/edit

If the list of source_types is inconsistent with ES-DOCs, please raise an issue pointing this out. Note that "source_type" is not meant to be a simple list of component models. Rather it distinguishes among different categories of models, which will make it more useful in ESGF searches. If anyone can think of a much better way of doing this, please propose it immediately. We want to finalize the global attributes document this weekend.

The list of component models appearing in the source_id should be comprehensive in the sense that if in any CMIP6 experiment a component is included, then it should be listed even if for other experiments the component is inactive (e.g., both the atmospheric and ocean components should be specified if the model runs DECK experiments even though in the AMIP run, the ocean is turned off).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/WCRP-CMIP/CMIP6_CVs/issues/48#issuecomment-245945363, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFTIJ7F5-zNbGqT9snvzIxao7RkWbLvqks5qoXmXgaJpZM4J3SJM.

markelkington commented 7 years ago

Karl

Agree with both of those points

Mark

From: taylor13 [mailto:notifications@github.com] Sent: 09 September 2016 16:41 To: WCRP-CMIP/CMIP6_CVs Cc: Elkington, Mark; Mention Subject: Re: [WCRP-CMIP/CMIP6_CVs] Coordinate with ES-DOC model and experiment CVs (#48)

@markelkingtonhttps://github.com/markelkington Just to confirm what others have suggested: For QC it is important to check whether the institution/source (i.e., model) pairs have been registered. In the source_id CV we therefore list the institution_ids of all institutions who have indicated they plan to contribute CMIP6 simulations generated by a given model. In most cases there will be only a single institution_id listed. Note that there is a separate institution_id CV where the full name and address associated with each institution_id are provided.

Similarly, in the experiment_id CV we plan to modify the structure slightly as discussed in #1https://github.com/WCRP-CMIP/CMIP6_CVs/issues/1 . The plan is to remove "sub_experiment" from this CV, and only include the list of possible "sub_experiment_ids". Then we will create a new CV called "CMIP6_sub_experiment_id.json" which will be a dictionary with "sub_experiment_id" as the key and "sub_experiment" the value associated with each key.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/WCRP-CMIP/CMIP6_CVs/issues/48#issuecomment-245951422, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFTIJ8gaqYTIYc-SdCPr2KenZjTHP0U4ks5qoX4hgaJpZM4J3SJM.

taylor13 commented 7 years ago

@markelkington
Hi Mark,

Re: your paragraph labeled "Re: land-ice": Thanks for explaining how things are done in ES-DOC and providing the nemo model example. That has me thinking we should perhaps back off on trying to record quite so much information about the model components in the “source” global attribute. We were thinking that as in CMIP5 we might want to include in “source”: the name of the full model, the vintage, and the names of the component models. It might be confusing to list “memo” as both the ocean model and the sea ice model. Perhaps “source” should just record 1) a complete and precise identifying label for the model (as it normally would be documented by the modeling group, and 2) model vintage (i.e., year the model was first used in a scientific application). For example:

source_id = “GFDL-CM2-1” source = “GFDL CM2.1: cycle 2.1.14 (2012)”

(Note there would be no restriction on "source" to remove forbidden characters like blanks and parentheses.]

Another option is something along the lines of CMIP5. For example:

source_id = “CCSM2” source = “CCSM2 (2002) atmosphere: CAM2 (cam2_0_brnchT_itea_2, T42L26); ocean: POP (pop2_0_ver_1.4.3, 2x3L15); sea ice: CSIM4; land: CLM2.0”

What do you and others think? thanks, Karl

taylor13 commented 7 years ago

@markelkington Hi Mark,

Re your paragraph labeled "Re: source_id" -- yes, the original intention was to include all components, whether active or not. [This assumes we still plan to list the components in the source_id CV.] Karl

durack1 commented 7 years ago

@eguil @markelkington @MartinaSt @momipsl as @taylor13 noted, it is best that we attempt to align our (controlled) vocabularies between CMIP6_CVs and ES-DOC. In particular the experiment_id and source_id.

Do you folks have a website/repo that lists the ES-DOC vocabs?

As you already have placeholder descriptive pages for the experiment_id, would it be useful that we include an ES-DOC_url or equivalent entry in each experiment entry so that it's clear where the detailed documentation can be obtained?

So for an example, the standing experiment_id entry for 1pctCO2 becomes:

        "1pctCO2":{
            "activity_id":[
                "CMIP"
            ],
            "additional_allowed_model_components":[
                "AER",
                "CHEM",
                "BGM"
            ],
            "description":"DECK: 1pctCO2",
            "ES-DOC_url":"http://view.es-doc.org/?renderMethod=id&project=cmip6-draft&id=18200dd7-c51e-4a23-9485-9a86ffc13dd5"
            "end_year":"",
            "experiment":"1 percent per year increase in CO2",
            "min_number_yrs_per_sim":"150",
            "parent_activity_id":[
                "CMIP"
            ],
            "parent_experiment_id":[
                "piControl"
            ],
            "required_model_components":[
                "AOGCM"
            ],
            "start_year":"",
            "sub_experiment":"none",
            "sub_experiment_id":"none",
            "tier":"1"
},

If this was useful, we could also do a similar thing for the activity_id, so expand the current list to be a dictionary with a similar ES-DOC_url entry for each, and as @taylor13 noted above we could add the placeholder ES-DOC_url to the source_id once the entry/page has been generated on the ES-DOC site.

Regarding the source_id question that @taylor13 has noted above, my preference would be to include all the basic identifying information that we currently have in the placeholder ACCESS-1-0 example - with the glacier->land_ice, and addition of aerosols (thanks @markelkington) and any additional vocab tweaks to maintain consistency - The addition of a ES-DOC_url would also best integrate the information across these systems

taylor13 commented 7 years ago

@markelkington Hi Mark, regarding your comment: "Re: source-type – do you want me to raise the issue in github for CMOR. I raised it in the ES-DOC repository some months ago. I don’t really mind which list is used (or even a combination of the list values) as long as there is just one list and it is agreed who maintains the content of that list. [Regarding the issue of using AOGCM as the value when we are running in atmosphere only mode – that seems OK to me]"

Re your last sentence: The current specification (in the WIP global attribute referred to above) says that a model performing an AMIP run should have the same source_id as when it is run in coupled mode (e.g., HadGEM1), but that source_type should be "AGCM" for AMIP and "AOGCM BCM" for a concentration-driven coupled model run (e.g., "historical") that includes a biogeochemical component model. (In this case neither should be "AOGCM".)

To make sure the vocabularies used by ES-DOC and by "source_type" are consistent, yes please point us to where the es-doc vocabulary is defined by raising an issue in this thread (not on the CMOR github repo).
thanks, Karl

taylor13 commented 7 years ago

Hi Eric,

Regarding the experiment_id CV, this information was obtained from the MIP co-chairs, first by Martin, then updated extensively by me over the last year or so. It is essentially finalized and the experiment_ids are in almost all cases consistent with what is found in the GMD experiment description papers. Charlotte at one point obtained a copy of the excel spread sheet from me, but I’m not sure whether she has altered the information in any way.

The most up-to-date experiment_id information is found here or for a web-based tabulated version, here

We have translated the critical experiment information from the original spreadsheet into the CMIP6_experiment_id.json file in the CMIP6_CVs repo. We now regard this .json file as the “reference” for the most important experiment information. ES-DOCs can import this information as needed. If there are any errors in the CV, please raise an issue on this repo. CMOR and ESGF will also rely on this reference CV for experiment_id. If additional information about the experiments should be included, please let us know immediately. What’s in there now is sufficient for CMOR and ESGF (but note the issue raised about sub_experiments, which we will be addressing shortly; see https://github.com/WCRP-CMIP/CMIP6_CVs/issues/1).

thanks, Karl

markelkington commented 7 years ago

Karl

I’d vote for the CMIP5 approach. It think it gives useful information to end users.

Mark

From: taylor13 [mailto:notifications@github.com] Sent: 09 September 2016 17:45 To: WCRP-CMIP/CMIP6_CVs Cc: Elkington, Mark; Mention Subject: Re: [WCRP-CMIP/CMIP6_CVs] Coordinate with ES-DOC model and experiment CVs (#48)

@markelkingtonhttps://github.com/markelkington

Hi Mark,

Re: your paragraph labeled "Re: land-ice": Thanks for explaining how things are done in ES-DOC and providing the nemo model example. That has me thinking we should perhaps back off on trying to record quite so much information about the model components in the “source” global attribute. We were thinking that as in CMIP5 we might want to include in “source”: the name of the full model, the vintage, and the names of the component models. It might be confusing to list “memo” as both the ocean model and the sea ice model. Perhaps “source” should just record 1) a complete and precise identifying label for the model (as it normally would be documented by the modeling group, and 2) model vintage (i.e., year the model was first used in a scientific application). For example:

source_id = “GFDL-CM2-1” source = “GFDL CM2.1: cycle 2.1.14 (2012)”

(Note there would be no restriction on "source" to remove forbidden characters like blanks and parentheses.]

Another option is something along the lines of CMIP5. For example:

source_id = “CCSM2” source = “CCSM2 (2002) atmosphere: CAM2 (cam2_0_brnchT_itea_2, T42L26); ocean: POP (pop2_0_ver_1.4.3, 2x3L15); sea ice: CSIM4; land: CLM2.0”

What do you and others think? thanks, Karl

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/WCRP-CMIP/CMIP6_CVs/issues/48#issuecomment-245969790, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AFTIJ8zy6TC1EHm4iELIr6oE-eH7RhQ0ks5qoYz3gaJpZM4J3SJM.

asladeofgreen commented 7 years ago

@durack1: I would strongly advise against embedding ES-DOC url's directly into the vocabs, i.e. ES-DOC syncs with the vocabs not the other way round. Any syncing issues will be raised as tickets on this repo.

eguil commented 7 years ago

Dear All, I will sit down with Paul next week in China to try to resolve this (I think we are getting there). I would vote to have the 'source' field as simple as possible. If it is not used by automated tools but just for a quick look at the file (say via a ncdump -h) then we should not worry too much about it. The CMIP5 version:

source = “CCSM2 (2002) atmosphere: CAM2 (cam2_0_brnchT_itea_2, T42L26); ocean: POP (pop2_0_ver_1.4.3, 2x3L15); sea ice: CSIM4; land: CLM2.0”

is too specific and opens the door for mismatch (as pointed out above). Maybe an alternative would be to list just the realms:

source_id = “CCSM2” source = “CCSM2: cycle 2.1.14 (2002): atmosphere; ocean; sea ice; land” and the details would be found under the further_info_url

Eric

taylor13 commented 7 years ago

Hi Eric and all,

"source_id" will definitely be ingested and used by machine (DRS, ESGF, file names, directory names, further_info_url, etc.), but my view is that "source" would not be tracked by the infrastructure but provide human readable information telling us what model (and model components) produced the output. I would think that good practice dictates that most modeling groups record full identifying information about their model (and its components) whenever they save an output file. I would think they would want to carry that provenance information over to the files they write for the CMIP archive. The information we collect as part of the "source_id" can capture the provenance information and then concatenate it together into the "source" attribute. It won't be a requirement for groups to do this (as you say source = "CCSM2: cycle 2.1.14 (2002) might be sufficient for some groups, but others might want to include component model information (if that's their usual practice), and we're providing an option for that.

We'll need to make it clear what information is required and what is optional.

thanks, Karl

durack1 commented 7 years ago

@eguil @taylor13 and I have worked through these templates and we're satisfied with the synchronization - closing

Folks on this thread please open up a new issue with more specific information about tweaks required (if they are)