Organisation of MIP tables

martinjuckes commented 7 years ago

The MIP tables paly a fundamental role in the organisation of variables in the data request. The tables have evolved from one phase of CMIP to the next. The CMIP3 request had 6 tables and 164 variables (http://www-pcmdi.llnl.gov/ipcc/standard_output.html ), and this rose to 18 tables and 1098 variables in CMIP5. For CMIP6 we have 45 tables and 2019 variables. The rationale behind the tables has become unclear and needs to be reviewed in advance of CMIP7, preferably before science teams start considering their data requirements.

taylor13 commented 7 years ago

In CMIP7 we might consider moving away from use of "Table" in file names and directory structures and instead rely on "frequency" and an extended grid_label. Here I propose a grid_label template that would enable users to find the data they need without learning the name/meaning of 45 tables. It also would allow us to eliminate special tables for Antarctica and Greenland.

For CMIP7 we should consider defining grid labels consistent with a template like:

g<realm[vert.id]>--[pt] (for example ga-n)

vert. i.d. would only be included for variables reported on one or more locations in the vertical, and the "pt" suffix would only be included for synoptic data (i.e., time "point" data, as opposed to time-mean data)

= ["A", "O", "I", "R", "L"], where "A" is for "atmosphere", "O" for "ocean", "I" for "ice sheet", and "R" for "rivers", and "L" for land, respectively, and this indicates which modeling realm's grid was used to generate the output. [need to check there is a need for all of these; or if we need additional model grids for say "sea ice"] vert.id = ["P", "Z", "L", "S"] with "P" for "pressure level data", "Z" for "vertical distance above a defined datum", "L" for "model level data", and "S" for the layer nearest the surface (e.g., CO2 concentration reported at the surface would be labeled with "S", but CO2 reported on one or more pressure levels would be labeled "P"). = ["m", "n", "r0", "r1", "r2", ...."nz", "r0z", "r1z", ..."ma", "mg", "na", "ng", "r0a", "r0g", "r1a", "r1g", ...], where "m" is for area mean over entire domain, "n" is for "native", "r" is for regridded, "z" is for "zonal mean", "a" is for a grid limited to the "Antarctica" region, and "g" is for a grid limited to the "Greenland" region. Examples: "gAP-n-pt": 3-d pressure-level data on native atmospheric grid sampled synoptically. "gOS-r0": 2-d ocean grid for the surface ocean layer regridded to target grid 0. "gR-n": 2-d data reported on the river-routing native grid. "gA-nz": data originating on the atmospheric grid that has been zonally averaged, reported at the native latitude positions. "gAL-m": global mean vertical profiles of atmospheric data reported on model levels. "gI-na": 2-d data reported on an ice-sheet model's grid over Antarctica. If we adopt this system in CMIP7, we could replace table_id in the file names and directory structure with . We could also eliminate the table_id as a search facet, since the user would have access to , , and to sub-select data of interest.

martinjuckes commented 4 years ago

cmip6dr / cmip7_forward_look

Organisation of MIP tables #1