WCRP-CMIP / CMIP6_CVs

Controlled Vocabularies (CVs) for use in CMIP6
Creative Commons Attribution 4.0 International
158 stars 80 forks source link

Reconfigure source_id format #264

Closed asladeofgreen closed 7 years ago

asladeofgreen commented 7 years ago

Existing JSON blob for ACCESS-1-0 with realms as inlined attributes:

    "ACCESS-1-0":{
        "activity_participation":[
            "CMIP"
        ],
        "aerosol":"CLASSIC (v1.0)",
        "atmosphere":"HadGAM2 (r1.1; N96, 192 x 145 longitude/latitude; 38 levels; top level 39255 m)",
        "atmospheric_chemistry":"None",
        "cohort":[
            "CMIP5"
        ],
        "institution_id":[
            "CSIRO-BOM"
        ],
        "label":"ACCESS 1.0",
        "label_extended":"ACCESS 1.0 (This entry is free text for users to contribute verbose information)",
        "land_ice":"None",
        "land_surface":"MOSES2.2",
        "nominal_resolution_atmos":[
            "100 km"
        ],
        "nominal_resolution_landIce":[
            "None"
        ],
        "nominal_resolution_ocean":[
            "100 km"
        ],
        "ocean":"ACCESS-OM (MOM4p1; tripolar primarily 1deg, 360 x 300 longitude/latitude; 50 levels; top grid cell 0-10 m)",
        "ocean_biogeochemistry":"None",
        "release_year":"2011",
        "sea_ice":"CICE4.1",
        "source_id":"ACCESS-1-0"
    }
asladeofgreen commented 7 years ago

The realm related information could be grouped:

    "ACCESS-1-0":{
        "activity_participation":[
            "CMIP"
        ],
        "cohort":[
            "CMIP5"
        ],
        "institution_id":[
            "CSIRO-BOM"
        ],
        "label":"ACCESS 1.0",
        "label_extended":"ACCESS 1.0 (This entry is free text for users to contribute verbose information)",
        "nominal_resolution_atmos":[
            "100 km"
        ],
        "nominal_resolution_landIce":[
            "None"
        ],
        "nominal_resolution_ocean":[
            "100 km"
        ],
        "realms": {
            "aerosol":"CLASSIC (v1.0)",
            "atmosphere":"HadGAM2 (r1.1; N96, 192 x 145 longitude/latitude; 38 levels; top level 39255 m)",
            "atmospheric_chemistry":"None",
            "land_ice":"None",
            "land_surface":"MOSES2.2",
            "ocean":"ACCESS-OM (MOM4p1; tripolar primarily 1deg, 360 x 300 longitude/latitude; 50 levels; top grid cell 0-10 m)",
            "ocean_biogeochemistry":"None",
            "sea_ice":"CICE4.1"
        }
        "release_year":"2011"
        "source_id":"ACCESS-1-0"
    }
asladeofgreen commented 7 years ago

The same concept could also be applied to the nominal_resolution attributes.

    "ACCESS-1-0":{
        "activity_participation":[
            "CMIP"
        ],
        "cohort":[
            "CMIP5"
        ],
        "institution_id":[
            "CSIRO-BOM"
        ],
        "label":"ACCESS 1.0",
        "label_extended":"ACCESS 1.0 (This entry is free text for users to contribute verbose information)",
        "nominal_resolution": {
            "atmos": "100 km",
            "landIce": "None",
            "ocean": "100 km",
        }
        "realms": {
            "aerosol":"CLASSIC (v1.0)",
            "atmosphere":"HadGAM2 (r1.1; N96, 192 x 145 longitude/latitude; 38 levels; top level 39255 m)",
            "atmospheric_chemistry":"None",
            "land_ice":"None",
            "land_surface":"MOSES2.2",
            "ocean":"ACCESS-OM (MOM4p1; tripolar primarily 1deg, 360 x 300 longitude/latitude; 50 levels; top grid cell 0-10 m)",
            "ocean_biogeochemistry":"None",
            "sea_ice":"CICE4.1"
        }
        "release_year":"2011"
        "source_id":"ACCESS-1-0"
    }
asladeofgreen commented 7 years ago

Also realm identifiers should be same as those defined with CMIP6_realm.json.

taylor13 commented 7 years ago

@momipsl This last suggestion is an especially good one. Why should we have multiple terms referring to the same thing?

I like the other suggestions too, but is the "grouping" structure solely organizational/aesthetic or is there a use-case (or potential use-case) that would justify making the change? (I agree the "grouping" makes the file easier for humans to read, but we expect most folks will just read http://rawgit.com/WCRP-CMIP/CMIP6_CVs/master/src/CMIP6_source_id.html.)

One thing to note is that "nominal_resolution" in the file is the resolution of the model components themselves (i.e, their native grids). In CMIP6 output files the "nominal_resolution" will refer to the resolution of the data as it is reported (which might have been regridded and a different resolution from the native grid.)

thanks for this contribution.

asladeofgreen commented 7 years ago

@taylor13:

  1. The grouping is simply good practice from an information architectural point of view.

  2. Ultimately most folks will end up reading something like https://documentation.es-doc.org/cmip5/models/hadgem2-es. But this will take time until the groups have actually written their CMIP6 documentation, until then they have the HTML file you linked to.

  3. Thx for the clarification of nominal_resolution.

durack1 commented 7 years ago

@dnadeau4 did you want to chime in on this?

taylor13 commented 7 years ago

I know folks have already written software in non-python languages that read some of the .json files and extract the information. Revising their structure would necessitate modifications that we should try to avoid, so lacking an imperative to restructure, let's not bother.

The names of the realms should be made consistent though.

durack1 commented 7 years ago

@momipsl @taylor13 I'm not sure there is anything to do here? @momipsl the realms we describe are more verbose than the standard CMIP6_realm.json entries, we have done this to make reading the source_id entries easier.

I have made the amendments in a PR however I think this makes the information harder to interpret (particularly for users that are not familiar with the abbreviated names..) So will discuss with @taylor13 before making this change

durack1 commented 7 years ago

@taylor13 I also wonder whether the scope of the nominal_ resolution_* entries is too small - we could potentially have different land grids to the atmos grids for e.g. and we have 3 nominal_resolution entries listed, yet 8 potential realms..

taylor13 commented 7 years ago

Yes, agree. Let's add the other realms.

dnadeau4 commented 7 years ago

@durack1 Why don't you call it something else than nominal_resolution? This will break my current tag. I guess I can just download 3.2.2 tag

durack1 commented 7 years ago

@momipsl we have been discussing your suggestions in-house, and have ascertained that a cleanup of this format is indeed a good idea. For this reason we will update the format to:

"ACCESS-1-0":{
    "activity_participation":[
        "CMIP"
    ],
    "cohort":[
        "CMIP5"
    ],
    "institution_id":[
        "CSIRO-BOM"
    ],
    "label":"ACCESS 1.0",
    "label_extended":"ACCESS 1.0 (This entry is free text for users to contribute verbose information)",
    "model_component":{
        "aerosol":{
            "description":"CLASSIC (v1.0)",
            "nominal_resolution":"100 km"
        },
        "atmos":{
            "description":"HadGAM2 (r1.1; N96, 192 x 145 longitude/latitude; 38 levels; top level 39255 m)",
            "nominal_resolution":"100 km"
        },
        "atmosChem":{
            "description":"None",
            "nominal_resolution":"None"
        },
        "land":{
            "description":"MOSES2.2",
            "nominal_resolution":"100 km"
        },
        "landIce":{
            "description":"None",
            "nominal_resolution":"None"
        },
        "ocean":{
            "description":"ACCESS-OM (MOM4p1; tripolar primarily 1deg, 360 x 300 longitude/latitude; 50 levels; top grid cell 0-10 m)",
            "nominal_resolution":"100 km"
        },
        "ocnBgChem":{
            "description":"None",
            "nominal_resolution":"None"
        },
        "seaIce":{
            "description":"CICE4.1",
            "nominal_resolution":"100 km"
    },
    "release_year":"2011",
    "source_id":"ACCESS-1-0"
},

@taylor13 @dnadeau4 please check this before the change is implemented in the coming days

asladeofgreen commented 7 years ago

@durack1: OK looks better. You must ensure that the realm names are coherent with the realm identifiers. Furthermore in the example above why bother with atmosChem, landIce, ocnBgChem when they are effectively nulls ?

durack1 commented 7 years ago

@momipsl exactly, as noted in https://github.com/WCRP-CMIP/CMIP6_CVs/issues/285#issuecomment-294171160 we are further cleaning up any inconsistencies. The reason that null entries are included are because we require a template for modeling centers to fill out, if the placeholder field is not there, we will not get any information. So for completeness we include all component identifiers and can then chose to ignore/omit the null entries when these are being used.

@momipsl I am uncertain of the ES-DOC controlled vocab, but if there are any glaring inconsistencies with the naming of fields above that could easily be tweaked before the change, please note the potential changes now