PCMDI / mip-cmor-tables

JSON tables for CMOR3 to create Model Intercomparison Project (MIP) datasets
Creative Commons Attribution 4.0 International
2 stars 4 forks source link

Prototype mip tables #3

Open matthew-mizielinski opened 2 years ago

matthew-mizielinski commented 2 years ago

Produce a set of prototype MIP tables.

Draft Pull request: #4

matthew-mizielinski commented 2 years ago

So in this notebook I've thrown together a straw-man for the project independent MIP tables and they can be found within the Tables directory of the branch.

There are quite a lot of steps, so in summary the workflow demonstrated is;

  1. Load existing MIP tables into a dictionary structure mip_tables_data[mip_table][variable_name]
  2. Load Data Request (v01.00.33) and
    1. Add data request uid corresponding to the CMOR variables to the MIP tables data
    2. Obtain QC ranges data where available in the data request and update the MIP tables information with it
  3. Use the metadata for each variable (modeling_realm, frequency, dimensions, cell_methods and existing MIP table name in a few cases) to construct a new MIP table name, building a mapping dictionary new_tables[new_table][variable_name] = [(mip_table, variable_name),...]. Note that there will be a few "duplicates" from CMIP6 that will end up in the same new MIP table, but these are dealt with later. The MIP table prefixes are primarily assigned based on the first modeling realm entry.
    1. A couple of frequencies have been tweaked: monC -> monClim and 1hrCM -> monDiurnal
    2. MIP table names are build from a prefix (based on realm), the frequency and a suffix. The prefix I've used for the Glacier Ice sheet data is GIA for Antarctica and GIG for Greenland. I couldn't find another way of distinguishing region based data -- we could require the data to be combined before publication, i.e. Greenland and Antarctica data regridded onto lat-lon and published as a single field, but this would likely get messy.
    3. Suffixes are applied to the MIP table based on structure, site specific = Site, model levels (atmos or ocean) = Lev, zonal means = Z.
    4. Fixed fields do not have a suffix applied
    5. We end up with a large number of tables (78), but the structure is hopefully more apparent/ understandable. If the number of tables is thought to be a problem we could rationalise the method for assignment, but we have to be careful of "conflicts" in the mapping from CMIP6 MIP table to new MIP table (see below)
  4. I've used the last methodology I had coded up to construct the "branded variable name" for each entry and added this to the original MIP table data. The idea here is to include this concept with the metadata from the start, which gives us the option to use it in DRS or file naming conventions should we choose to within a project or to keep it as a metadata item.
    1. I think the construction method for these needs updating
    2. I've added a suffix 'G' for Greenland and 'A' for Antarctica -- is this a good idea?
    3. Conflicts: There are variables within the CMIP6 MIP tables that are either duplicates, e.g. AERmon/ps, Amon/ps, CFmon/ps, Emon/ps or have subtle differences which are not picked up by the methodology for assigning a new MIP table. The following conflict types were identified;
    4. exact duplicates; ps in 3hr and monthly MIP tables.
    5. Differences in cell_methods; a group of variables in the CMIP6 day MIP table have cell_methods of area: time: mean, while counterparts in the CMIP6 Emon MIP table have area: mean where land time: mean. I //think// (please correct this) that the variables with area: mean where land in their cell_methods could be reconstructed from the area: mean equivalent using a fixed land mask or land area fraction field, but there may be some subtleties I'm missing.
    6. Differences in pressure level sets, e.g. Eday/ta used plev19 while day/ta used plev8; For these I've appended the number of pressure levels to the variable name.
    7. Differences in grid; Amon/prsn and Omon/prsn result in a variable in the same new MIP table APmon. the Omon/prsn version is on the ocean grid, so I've changed the modeling_realm to ocean and moved it to the OPmon table.
  5. I've removed the APfxSite table as I don't think the latitude and longitude variables here are useful (please correct if I've missed something here).
  6. Write MIP tables to file:
    1. Slight modification to the header metadata -- this needs to be discussed and confirmed
    2. dimensions field is now a list of strings for clarity -- all lists should be included as lists unless there is good reason. We could consider separating out scalar coordinates, e.g. height2m into a separate field
    3. Added a provenance field with CMIP6 MIP table, variable name and uid from the CMIP6 data request.
    4. moved validation information to a sub section
    5. Overwritten out_name with the variable name for every variable. I might be tempted to remove this field outright
  7. Copied over the formula_terms, coordinates, and grids files from CMIP6
  8. Filtered the fields in the CMIP6_CV.json file and written it to generic_CV.json (needs some work)

This is only a first straw man idea of how this could be approached, certain features (the branded name in particular) likely need some work, and there are a few things I haven't added that we have discussed: a. standard_name status for new variables b. removal of out_name

matthew-mizielinski commented 2 years ago

@durack1, @taylor13, comments on the above would be welcome

durack1 commented 1 year ago

It would be great to fold in the CMIP5 (e.g. CMIP5_Amon) and CMIP3 (e.g. IPCC_table_A1) provenance as well e.g. CMIP3 tas https://github.com/PCMDI/cmip3-cmor-tables/blob/master/Tables/IPCC_table_A1#L915-L934

durack1 commented 5 months ago

@matthew-mizielinski @wolfiex @taylor13 is there anything relevant to the 6.6 milestone that we should note, or should this be closed?