PCMDI / mip-cmor-tables

JSON tables for CMOR3 to create Model Intercomparison Project (MIP) datasets
Creative Commons Attribution 4.0 International
2 stars 4 forks source link

Registering missing institution_id entries for obs4MIPs #41

Open durack1 opened 7 months ago

durack1 commented 7 months ago

Just linking across repos following https://github.com/PCMDI/input4MIPs_CVs/issues/8.

We need to register DLR-BIRA, ESPRI-IPSL, GloH2O, INCOIS-NIO-IPSL, NOAA-ESRL-PSD, UCI-CHRS, and UCSD-SIO.

This is a matched issue with https://github.com/PCMDI/obs4MIPs_CVs/issues/1

wolfiex commented 7 months ago

We need to separate the consortiums from the institutions. Consortiums will not have an ROR number.

taylor13 commented 7 months ago

Having separate tables for consortiums and institutions may make some QC checks more complicated. If, for example, we impose a directory structure that includes "institution", but a consortium is responsible, we want the consortium to appear instead of the institution as part of the directory structure. If we maintain separate CVs for consortiums and institutions, software checking that all the elements of a directory structure are included in a CV would have to check both the institution CV and the consortium CV.
Could we include the consortia in the institutions CV with the following changes?

  1. "consortium_members":[]. (We might require this only if "institution" were really a consortium, or if the "institution" was only a simple institution, it could be set to "NONE")
  2. "in_consortium":[] (We might require this only for institutions that are in one or more consortiums, or it could be set to "NONE" if not in any.
wolfiex commented 5 months ago

These can now be added via an issue template as of https://github.com/PCMDI/mip-cmor-tables/pull/49

wolfiex commented 5 months ago

Just as an idea for discussion:

Two Institution consortiums are actually a partnership - of which we have many. Does it make sense to allow multiple institutions to be submitted per entry instead?

What are the funding implications of doing so?

Similarly would we want to differentiate datasets submitted before and after a new entity joins a consortium?

durack1 commented 5 months ago

@wolfiex @taylor13 This is starting to get complicated; I suggest we intentionally try to simplify things as much as we can. I had been thinking that we need an "institution" (bricks and mortar, with a postal address) that would then be eligible for an ROR, and rather than have consortium "institutions," we'd catch these consortiums as part of the source_id - my thinking with https://github.com/PCMDI/input4MIPs_CVs/issues/9, UExeter is the lead (Thomas' host) institution, but the dataset is a team/consortium effort which is identified by a source_id that may have multiple institutions listed, with the first entry the dataset institution_id

taylor13 commented 5 months ago

Is this what you are proposing?

  1. Each source_id appearing in the CV would include a sub-entry ("institution_id") listing all the institution(s) that might be responsible for datasets produced by the source.
  2. In each file, a global attribute ("institution_id") would record only the subset of CV institutions that were actually responsible for a given dataset. Only one institution would be listed except in the case of a consortium or partnership.
  3. The DRS would use the first institution listed in the institution_id global attribute in creating a dataset I.D. and in creating directory structures.

I don't see any problems with this, but it will decrease visibility of the consortium name, which will appear nowhere. Will EC-Earth be o.k. with this?

durack1 commented 5 months ago

@taylor13 I think I need to calibrate with @wolfiex and better understand how information is partitioned between the MIP_consortiums.json and MIP_institutions.json files - airplane wifi is too spotty.

For the CMIP6 EC-Earth3* examples, everyone of these had a listed institution_id = EC-Earth-Consortium which includes ~30 institutions identified by a name/acronym and country with a mailing address of SMHI, one of the 30 listed centres (see CMIP6_institution_id.html). This is a particularly good example of why my simplified system would not work well - whereas it could work in the case of the volcanic forcing.. It might be useful to defer this issue to a discussion, as there is much to calibrate on it seems, and working through all existing examples is the only way to definitively come up with a path that maps across all existing (and hopefully future) configs

wolfiex commented 5 months ago

The way I understand it

The work, submission, data, modelling is still done by a single person at an institution but commissioned or funded as part of something greater. It is therefore recorded from a consortium.

The same way we can have ESPRI-IPSL INCO?-NEO-IPSL and direct submissions by IPSL

(To be corrected later)