Closed matthew-mizielinski closed 2 years ago
@matthew-mizielinski @taylor13 let's centralize discussions here. As I noted, I already have code that pulls info from the CMIP6 (or 5, 3) indexes and will return information such as that found in durack1/CMIPOcean/CMIP_ESGF.json
I suggest not purging any registered source_ids or institution_ids at this time, but I think that, If practical, we should:
amip
, abrupt_4xCO2
, 1pctCO2
, and piControl
(or esm-piControl
). They would be designated "CMIP, DECK" if in addition they have contributed results from historical
(or esm-hist
or historical-cmip5
) Otherwise they would continue to be designated "registered".Regarding the update of "cohort" classification, we can be guided by the initially suggested policy on cohort designations (https://goo.gl/zDHUk7; 9 January 2018
The only choices permitted under the “Model Cohort'' category are the following: DECK, CMIP6, CMIP5, CMIP3, CMIP2, CMIP1, “CMIP6-fringe”, and “Registered”. A “Model Cohort'' limits a search to models that meet certain MIP criteria (for example, completion of 4 DECK experiments plus the historical simulation is usually required to be included in the “CMIP6” cohort). The CMIP panel will record and update the “Model Cohorts'' that each source_id (i.e., model) belongs to the reference source_id CV found at WCRP-CMIP/CMIP6_CVs/CMIP6_source_id.json. Only models that qualify for at least one “Model Cohort'' shall be considered for inclusion in a search result. The following define the cohorts:
I agree with Karl for 1. and 2.
Yes, I agree the Model Cohort could provide information of value to users. The reasons for possibly removing it as a search facet are:
Perhaps Sasha might say if any of the above is based on my misunderstanding ESGF.
We can achieve it if its worth the effort. (1) is easier to do than (2). It could take weeks at LLNL for scripts to complete for all our 5M replica records. I'm open to dropping the facet
Sorry, I must have too much else on my mind... There is a simple command to update all records that match a query. So each site just needs to re-run a query/update operation periodically. If we have new records published in the correct cohort, we can drop the need to make the corrections.
So I take it 2 is easier than 1?
Other way around (2) involves herding cats, should also mention we need to check for the performance implications of doing updates in bulk which complicates things
Got it. Executing (2) is technically trivial; getting folks to execute it could be difficult. On the other hand (1) requires some effort by PCMDI to write scripts: 1) to periodically check the ESG database and update the source_id CV so that it reflects the true "cohort" status for each model, and then 2) to transfer the updates from the CV to ESG and correct the ESG archive's database index.
(Again, @sashakames, I've probably not understood, so please correct, as needed, the above.)
I was thinking of 1.2 (esgf index update phase) being not too challenging for me to implement. The query part of 1.1: doesn't ChrisM's "Big Table" have this already - experiments for each model? so we could leverage that, but performing the queries I wouldn't consider too challenging, if need be.
To clarify the concern, the bulk updates might time out if there are 100000s of records to process for each in bulk. If this is problematic we would need to play with the granularity of update (eg do one experiment at a time).
Ideally once a model has changed cohort, we ask them to update their publisher config to have the cohort value set correctly, then we don't need to correct them again until the next change. And same goes for replica publishing.
A specific case that needs to be accounted for is https://github.com/WCRP-CMIP/CMIP6_CVs/issues/512
From WIP meeting discussion:
As part of #1066 models that have no published data on ESGF have been left as "cohort" = ["Registered"]
, whereas models that have data have been updated to "cohort" = ["Published"]
.
It would be possible to contact the modeling groups of the non-published models, not sure we'd want to deregister any specific model
All models that do not currently have data available anywhere on ESGF now have the entry "cohort": ["Registered"]
.
All models that have data have an updated entry as "Published"
- as per #1066
An email was sent out today requesting an update for the 28 models that currently have no data published on ESGF. The request was for data to be published, or for deregistration to occur - once we have intel from these contacts, we can amend as required and close out this issue
@matthew-mizielinski I am closing this as a dupe (somewhat) of #1050, which includes the table of 28 models that are registered but missing data published on ESGF which are now down to 14 in the updated table below. The process of identifying these, and either deregistering or awaiting an update for imminent publication is already underway and noted in #1076, #1078, #1079, #1083, and #1086, and the NorESM2* deregistrations - see #1079/#1084.
Updated 220701 - last merged PR #1126
count | status | source_id | MIPs | LLNL files | ESGF datasets | contact status |
---|---|---|---|---|---|---|
#1076 | awaiting publication | EC-Earth3-GrIS | CMIP ISMIP6 PMIP | - | none | |
#1076 | awaiting publication | EC-Earth3-HR | CMIP DCPP HighResMIP | - | none | |
#1083 | awaiting update | GFDL-GLOBAL-LBL | RFMIP | - | none | |
#1078 | awaiting publication | IPSL-CM6A-ATM-ICO-HR | HighResMIP | - | none | |
#1078 | awaiting publication | IPSL-CM6A-ATM-ICO-LR | HighResMIP | - | none | |
#1078 | awaiting publication | IPSL-CM6A-ATM-ICO-MR | HighResMIP | - | none | |
#1078 | awaiting publication | IPSL-CM6A-ATM-ICO-VHR | HighResMIP | - | none | |
#1078 | awaiting publication | IPSL-CM6A-ATM-LR-REPROBUS | AerChemMIP | - | none | |
#1078 | awaiting publication | IPSL-CM6A-MR025 | CMIP | - | none | |
#1078 | awaiting publication | IPSL-CM6A-MR1 | CMIP | - | none | |
#1105 | awaiting publication | CAM-MPAS-HR | HighResMIP | - | none | Ruby, Bryce, Koichi emailed |
#1105 | awaiting publication | CAM-MPAS-LR | HighResMIP | - | none | Ruby, Bryce, Koichi emailed |
#1116 #1117 | awaiting publication | AWI-ESM-2-1-LR | CMIP PMIP | - | none | Tido, Christian, Gerrit, Martin, Paul and Christopher emailed |
#1093 | deregistered | NICAM16-9D-L78 | CFMIP CMIP | - | none | |
#1087 | deregistered | NorESM2-HH | CMIP HighResMIP | - | none | |
#1100 #1103 | deregistered | BNU-ESM-1-1 | C4MIP CDRMIP CFMIP CMIP GMMIP GeoMIP OMIP RFMIP ScenarioMIP | - | none | Duoying emailed |
#1102 #1106 | deregistered | CESM2-SE | CMIP HighResMIP | - | none | Gokhan & Gary emailed |
#1101 #1104 | deregistered | CNRM-ESM2-1-HR | CMIP OMIP ScenarioMIP | - | none | David, Gaelle, Laurent and Marie-Pierre emailed |
#1111 #1112 | deregistered | EMAC-2-53-Vol | CMIP VolMIP | - | none | 5 addresses emailed |
#1111 #1112 | deregistered | EMAC-2-54-AerChem | AerChemMIP CMIP | - | none | 5 addresses emailed |
#1122 #1123 | deregistered | VRESM-1-0 | CMIP DAMIP HighResMIP PMIP ScenarioMIP | - | none | Francois/Pedro emailed - deregister 220630 |
#1086 #1124 | deregistered | UofT-CCSM4 | CMIP PMIP | - | none | dchandan pinged - deregister 220630 |
#1120 #1125 | deregistered | BESM-2-9 | CMIP DCPP ScenarioMIP | - | none | Andre & Paulo emailed - deregister 220630 |
#1121 #1126 | deregistered | CSIRO-Mk3L-1-3 | CMIP PMIP | - | none | Steve emailed - deregister 220630 |
@matthew-mizielinski I realised that closing this wasn't the best idea, as we need somewhere to keep track of the remaining unresolved/deregistrations, so will reopen and update the table above as required. 12 remaining questions to answer.
@matthew-mizielinski et al, all models with no data and no intention to publish data imminently have now been deregistered, so I can close out this issue, with the remaining license updates to be dealt with by #1113
Following discussion on #512 I've scraped together data from the ESGF search pages (list of source ids) and the source id list within the CVs to pull out the following table of models where no data appears to be available at the time of writing (July 2021).
This includes a number of institutions where no data has been published for their models and one institution without any models
There are a total of 28 models in the table below with a further 4 registered in the last 12-18 months.
I'm not currently advocating purging all of these, but I think it worth a discussion as to how to handle this
There are also the following recent additions (2020 and 2021 release years)