Closed larsbuntemeyer closed 6 months ago
should we keep model_components?
. This informations is condensed during creation of the CV
file, e.g., into:
"source_id":{
"REMO2020":{
"activity_participation":[
"CORDEX"
],
"cohort":[
"Registered"
],
"institution_id":[
"GERICS"
],
"license":"REMO2020 data produced by GERICS is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0; https://creativecommons.org/licenses/by/4.0/). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing input4MIPs output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file). The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law.",
"source_id":"REMO2020",
"source":"REMO2020 (2022): \naerosol: CLASSIC (v1.0)\natmos: HadGAM2 (r1.1, N96; 192 x 145 longitude/latitude; 38 levels; top level 39255 m)\natmosChem: none\nland: CABLE2.4\nlandIce: none\nocean: ACCESS-OM2 (MOM5, tripolar primarily 1deg; 360 x 300 longitude/latitude; 50 levels; top grid cell 0-10 m)\nocnBgchem: WOMBAT (same grid as ocean)\nseaIce: CICE4.1 (same grid as ocean)"
}
},
In general, we can keep the CMIP6 template with some updates (e.g. lake model) and perhaps we don't need "native_nominal_resolution" as it's not a constant and depends on resolution of a domain (e.g. EUR-44 or EUR-11).
might become helpful to gather model docs: https://github.com/ES-DOC.
I follow up a comment from @sethmcg on WCRP-CORDEX/discuss#11 here, which seems more on topic:
Does the
source_id
identify the model / method used to perform the downscaling? If so, I'm not sure that release_year and institution_id are well-defined for methods that aren't RCMs. For example, what would they be for the (simplistic but still widely-used) ESD method of interpolation + bias-correction?
The source_id
should identify the method used. In this sense, I think institution_id
would be perfectly defined for simple ESD methods. It must reflect the groups using exactly that method and keep a consistent source_id
among them as long as the method is exactly the same. It does not mean that a given institution developed the method.
Regarding the release year, the group could provide the first registered use of the particular method (e.g. the year of the oldest paper one can find using this method). I guess this is also the spirit of collecting this info in GCMs/RCMs; to have an idea of the latest update of a method.
We could have one such example already from CAM-11. At UCR, they have already applied an ESD method to CMIP6 models. They call it BCSD and use as reference Wood et al, 2004. Therefore, this could register as:
{
"source_id": {
"BCSD": {
"activity_participation": [
"CORDEX-ESD"
],
"cohort": [
"Registered"
],
"institution_id": [
"UCR"
],
"label":"Bias Correction and Spatial Disaggregation",
"source_type":"ESD"
"license":"Creative Commons Attribution 4.0 International License (CC BY 4.0; https://creativecommons.org/licenses/by/4.0/).",
"model_description":"https://doi.org/10.1023/B:CLIM.0000013685.99609.9e",
"release_year":"2004",
"source_id": "BCSD"
}
}
}
Here, I'm using some changes proposed in WCRP-CORDEX/discuss#11 and WCRP-CORDEX/cordex-cmip6-cv#20, but not yet in place. Namely, CORDEX-ESD as activity_participation (instead of just CORDEX), a source_type
, and use of a model_description
URL instead of the model_component
@larsbuntemeyer , must source
be built out of the model_component
? or could we include it explicitly inthe json entry? In this case, I would say that "source"="Bias Correction and Spatial Disaggregation"
and the label just as the source_id: "label"="BCSD"
That makes sense, but I think it needs to be spelled out explicitly somewhere. The CMIP documents are very much written from the GCM developer's perspective, and it's not always obvious how to adapt things to downscaling activities. If CORDEX CVs are inheriting a lot of metadata architecture from CMIP, I think there should be a document in this repo (or at least pointed to in the README) that references the CMIP6 Global Attributes, DRS, Filenames, Directory Structure, and CV’s doc and details what has been added / updated / changed / expanded for CORDEX.
Well, this document should be the CORDEX-CMIP6 Archiving Specifications we are writing in parallel to the development of this repo. In this repo, most files are simple lists of values corresponding to a given CV element. For those which are not simple lists (source_id, experiment_id, ...) we could include here a companion markdown file (CORDEX-CMIP6_source_id.md) with the explanation of their structure. Much like https://github.com/WCRP-CMIP/CMIP6_CVs/blob/master/.github/Model_registration_template.md , which can then be used in the registration to provide instructions to correctly fill the registration issue template.
regarding release_year
originated in CMIP6, can be simply "provide when relevant"
"activity_participation" should be simply ESD
in this case
{
"source_id": {
"BCSD": {
"activity_participation": [
"CORDEX-ESD"
],
"cohort": [
"Registered"
],
"institution_id": [
"UCR"
],
"label":"Bias Correction and Spatial Disaggregation",
"source_type":"ESD"
"license":"Creative Commons Attribution 4.0 International License (CC BY 4.0; https://creativecommons.org/licenses/by/4.0/).",
"model_description":"https://doi.org/10.1023/B:CLIM.0000013685.99609.9e",
"release_year":"2004",
"source_id": "BCSD"
}
}
}
We will need a new building rule for the source
global attribute. It used to be a text with all model components pasted together. Should we now just take the label_extended
as source
? or something more elaborated? For example:
source = f"{label_extended}. See {further_info_url} for further configuration details."
further_info_url
in CMIP6 leads to ES-DOC that we are missing in CORDEX-CMIP6. Can we skip label_extended
and use simply source
(full model name/version). Or we need to use label_extended
for consistency with CMIP6 ?
Inherited from
CMIP6
, we have, e.g.,