WCRP-CORDEX / cordex-cmip6-cv

Controlled Vocabulary (CV) for use in CORDEX
BSD 3-Clause "New" or "Revised" License
1 stars 6 forks source link

`source_id`: what info is required for registration? #4

Closed larsbuntemeyer closed 6 months ago

larsbuntemeyer commented 2 years ago

Inherited from CMIP6, we have, e.g.,

{
    "source_id": {
        "REMO2020": {
            "activity_participation": [
                "CORDEX"
            ],
            "cohort": [
                "Registered"
            ],
            "institution_id": [
                "GERICS"
            ],
            "label":"REMO2020",
            "license":"REMO2020 data produced by GERICS is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0; https://creativecommons.org/licenses/by/4.0/). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing input4MIPs output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file). The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law.",
            "model_component":{
                "aerosol":{
                    "description":"CLASSIC (v1.0)",
                    "native_nominal_resolution":"250 km"
                },
                "atmos":{
                    "description":"",
                    "native_nominal_resolution":"250 km"
                },
                "atmosChem":{
                    "description":"none",
                    "native_nominal_resolution":"none"
                },
                "land":{
                    "description":"",
                    "native_nominal_resolution":"250 km"
                },
                "landIce":{
                    "description":"",
                    "native_nominal_resolution":"none"
                },
                "ocean":{
                    "description":"prescribed",
                    "native_nominal_resolution":"100 km"
                },
                "ocnBgchem":{
                    "description":"",
                    "native_nominal_resolution":"100 km"
                },
                "seaIce":{
                    "description":"prescribed",
                    "native_nominal_resolution":"100 km"
                }
            },
            "release_year":"2022",
            "source_id": "REMO2020"
        }
    }
}
larsbuntemeyer commented 2 years ago

should we keep model_components?. This informations is condensed during creation of the CV file, e.g., into:

 "source_id":{
            "REMO2020":{
                "activity_participation":[
                    "CORDEX"
                ],  
                "cohort":[
                    "Registered"
                ],  
                "institution_id":[
                    "GERICS"
                ],  
                "license":"REMO2020 data produced by GERICS is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0; https://creativecommons.org/licenses/by/4.0/). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing input4MIPs output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file). The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law.",
                "source_id":"REMO2020",
                "source":"REMO2020 (2022): \naerosol: CLASSIC (v1.0)\natmos: HadGAM2 (r1.1, N96; 192 x 145 longitude/latitude; 38 levels; top level 39255 m)\natmosChem: none\nland: CABLE2.4\nlandIce: none\nocean: ACCESS-OM2 (MOM5, tripolar primarily 1deg; 360 x 300 longitude/latitude; 50 levels; top grid cell 0-10 m)\nocnBgchem: WOMBAT (same grid as ocean)\nseaIce: CICE4.1 (same grid as ocean)"
            }   
        },  
gnikulin commented 2 years ago

In general, we can keep the CMIP6 template with some updates (e.g. lake model) and perhaps we don't need "native_nominal_resolution" as it's not a constant and depends on resolution of a domain (e.g. EUR-44 or EUR-11).

larsbuntemeyer commented 1 year ago

might become helpful to gather model docs: https://github.com/ES-DOC.

jesusff commented 1 year ago

I follow up a comment from @sethmcg on WCRP-CORDEX/discuss#11 here, which seems more on topic:

Does the source_id identify the model / method used to perform the downscaling? If so, I'm not sure that release_year and institution_id are well-defined for methods that aren't RCMs. For example, what would they be for the (simplistic but still widely-used) ESD method of interpolation + bias-correction?

The source_id should identify the method used. In this sense, I think institution_id would be perfectly defined for simple ESD methods. It must reflect the groups using exactly that method and keep a consistent source_id among them as long as the method is exactly the same. It does not mean that a given institution developed the method.

Regarding the release year, the group could provide the first registered use of the particular method (e.g. the year of the oldest paper one can find using this method). I guess this is also the spirit of collecting this info in GCMs/RCMs; to have an idea of the latest update of a method.

We could have one such example already from CAM-11. At UCR, they have already applied an ESD method to CMIP6 models. They call it BCSD and use as reference Wood et al, 2004. Therefore, this could register as:

{
    "source_id": {
        "BCSD": {
            "activity_participation": [
                "CORDEX-ESD"
            ],
            "cohort": [
                "Registered"
            ],
            "institution_id": [
                "UCR"
            ],
            "label":"Bias Correction and Spatial Disaggregation",
            "source_type":"ESD"
            "license":"Creative Commons Attribution 4.0 International License (CC BY 4.0; https://creativecommons.org/licenses/by/4.0/).",
            "model_description":"https://doi.org/10.1023/B:CLIM.0000013685.99609.9e",
            "release_year":"2004",
            "source_id": "BCSD"
        }
    }
}

Here, I'm using some changes proposed in WCRP-CORDEX/discuss#11 and WCRP-CORDEX/cordex-cmip6-cv#20, but not yet in place. Namely, CORDEX-ESD as activity_participation (instead of just CORDEX), a source_type, and use of a model_description URL instead of the model_component @larsbuntemeyer , must source be built out of the model_component? or could we include it explicitly inthe json entry? In this case, I would say that "source"="Bias Correction and Spatial Disaggregation" and the label just as the source_id: "label"="BCSD"

sethmcg commented 1 year ago

That makes sense, but I think it needs to be spelled out explicitly somewhere. The CMIP documents are very much written from the GCM developer's perspective, and it's not always obvious how to adapt things to downscaling activities. If CORDEX CVs are inheriting a lot of metadata architecture from CMIP, I think there should be a document in this repo (or at least pointed to in the README) that references the CMIP6 Global Attributes, DRS, Filenames, Directory Structure, and CV’s doc and details what has been added / updated / changed / expanded for CORDEX.

jesusff commented 1 year ago

Well, this document should be the CORDEX-CMIP6 Archiving Specifications we are writing in parallel to the development of this repo. In this repo, most files are simple lists of values corresponding to a given CV element. For those which are not simple lists (source_id, experiment_id, ...) we could include here a companion markdown file (CORDEX-CMIP6_source_id.md) with the explanation of their structure. Much like https://github.com/WCRP-CMIP/CMIP6_CVs/blob/master/.github/Model_registration_template.md , which can then be used in the registration to provide instructions to correctly fill the registration issue template.

gnikulin commented 1 year ago

regarding release_year originated in CMIP6, can be simply "provide when relevant"

gnikulin commented 1 year ago

"activity_participation" should be simply ESD in this case

{
    "source_id": {
        "BCSD": {
            "activity_participation": [
                "CORDEX-ESD"
            ],
            "cohort": [
                "Registered"
            ],
            "institution_id": [
                "UCR"
            ],
            "label":"Bias Correction and Spatial Disaggregation",
            "source_type":"ESD"
            "license":"Creative Commons Attribution 4.0 International License (CC BY 4.0; https://creativecommons.org/licenses/by/4.0/).",
            "model_description":"https://doi.org/10.1023/B:CLIM.0000013685.99609.9e",
            "release_year":"2004",
            "source_id": "BCSD"
        }
    }
}
jesusff commented 11 months ago

We will need a new building rule for the source global attribute. It used to be a text with all model components pasted together. Should we now just take the label_extended as source? or something more elaborated? For example:

source = f"{label_extended}. See {further_info_url} for further configuration details."
gnikulin commented 11 months ago

further_info_url in CMIP6 leads to ES-DOC that we are missing in CORDEX-CMIP6. Can we skip label_extendedand use simply source (full model name/version). Or we need to use label_extended for consistency with CMIP6 ?