PCMDI / input4MIPs_CVs

Controlled Vocabularies (CVs) for use in input4MIPs
Creative Commons Attribution 4.0 International
1 stars 1 forks source link

Generate version history json file - for use by ES-DOCS #2

Open durack1 opened 6 years ago

durack1 commented 6 years ago

@eguil @davidhassell this issue has been generated following the email correspondence regarding https://es-doc.org/cmip6-ensembles-conformance/

It will be useful to iterate over the format of the json version info within this issue

durack1 commented 6 years ago

@davidhassell @eguil here is a first pass at a version lookup file - please review and let me know if fields should be ordered differently to make it easier to use.

PCMDI/input4MIPs-cmor-tables/Versions/6.2.0.json

Once we have finalised the format, I can regenerate the versions back in time. Any future release will have a new 6.x.y.json file generated

durack1 commented 6 years ago

@davidhassell @eguil, I'm wondering whether sorting these by target_mip keys, with the institution_id values as a second level will make these easier for you to use?

durack1 commented 6 years ago

@davidhassell @eguil it would be great to get your feedback soon on the format, as I am anticipating significant changes as new datasets are generated and published, and without feedback you're going to be stuck using the existing format of the json info

durack1 commented 6 years ago

@davidhassell @eguil I have made a change to the format, please take a look at the files now building in PCMDI/input4MIPs-cmor-tables/Versions

MartinaSt commented 6 years ago

@davidhassell @eguil This JSON file with the versions is very helpful. It would provide the user with even more information, if you add a short reason for deprecation. Currently, I can only mention that the data is deprecated and point the user to the current version and the version google doc (see e.g.: https://doi.org/10.22033/ESGF/input4MIPs.1120 ).

durack1 commented 6 years ago

@MartinaSt thanks for the feedback, if you see these files as useful for you, then please feel free to suggest changed/augmentations/amendments to the format and content, so that it's easiest to use for you. The plan that I had, was once a format had been finalized, then I will generate versions extending all the way back to the original release v6.0.0 (20th December 2016) as noted in the google doc

I think also having each of the DOIs for published/DOI-minted data would also be a useful addition

MartinaSt commented 6 years ago

@durack1 Thanks, Paul. Having the change of the DRS and my matching in mind, it would be great if you could add:

Could you avoid '' notations, e.g. '2017-05-18 (-AIR-*)' and replace these by all individual versions? The current notation is difficult to parse.

In the currentVersionNotes you have split the note into multiple list entries. It would be good to have a single note per data version. Example from the 6.2.1. JSON: "currentVersionNotes":[ "latest AIR datasets are 2017-08-30 (except", " SO2), and SO2 aircraft emission files 2017-10-05", ", which deprecate 2017-05-18"

It would be great if you could make these changes. Is this information sufficient or do you need more information from me?

Adding the DOIs is an excellent idea. It would be easiest if we had the DRS of the data collection on the DOI granularity directly in the JSON, e.g. %(mip_era)s.%(activity)s.%(institution)s.%(source_id)s [CMIP6.input4MIPs.PCMDI.PCMDI-AMIP-1-1-2]. (new DRS after republication)

durack1 commented 6 years ago

@MartinaSt I hadn't thought about your use of this, glad it will be useful for you. Can you take a pass at editing the current Versions/6.2.1.json version of the file to the format that you want? If I have an example of the changes that you want implemented, it'll be easier for me to propagate the changes across all datasets in the collection.

MartinaSt commented 6 years ago

@durack1 The ideal structure from the citation point of view would be with examples ImperialCollege and PNNL-JGCRI in the new DRS:

{ "input4MIPs_version":{ "data":{ "CMIP6.input4MIPs.ImperialCollege.ImperialCollege-1-0":{ "institution_id":"ImperialCollege", "source_id":"ImperialCollege-1-0", "mip_table":["C4MIP","OMIP"], "data_type":"atmosphericState", "version":"1.0", "VersionInfo":"deprecated", "VersionNotes":"...to be added: reason for deprecation...", "doi":"10.22033/ESGF/input4MIPs.1162" }, "CMIP6.input4MIPs.ImperialCollege.ImperialCollege-1-1":{ "institution_id":"ImperialCollege", "source_id":"ImperialCollege-1-1", "mip_table":["C4MIP","OMIP"], "data_type":"atmosphericState", "version":"1.1", "VersionInfo":"current", "VersionNotes":"...", "doi":"10.22033/ESGF/input4MIPs.1601" }, "CMIP6.input4MIPs.ImperialCollege.ImperialCollege-2-0":{ "institution_id":"ImperialCollege", "source_id":"ImperialCollege-2-0", "mip_table":["C4MIP","OMIP"], "data_type":"atmosphericState", "version":"2.0", "VersionInfo":"current", "VersionNotes":"...", "doi":"10.22033/ESGF/input4MIPs.1602" }, "CMIP6.input4MIPs.PNNL-JGCRI.CEDS-2017-05-18":{ "institution_id":"PNNL-JGCRI", "source_id":"CEDS-2017-05-18", "mip_table":["CMIP"], "data_type":"emissions", "version":"2017-05-18", "VersionInfo":"current", "VersionNotes":"...", "doi":"10.22033/ESGF/input4MIPs.1241" }, "CMIP6.input4MIPs.PNNL-JGCRI.CEDS-2017-05-18-supplemental-data":{ "institution_id":"PNNL-JGCRI", "source_id":"CEDS-2017-05-18-supplemental-data", "mip_table":["CMIP"], "data_type":"emissions", "version":"2017-05-18-supplemental-data", "VersionInfo":"current", "VersionNotes":"...", "doi":"10.22033/ESGF/input4MIPs.1242" }, "CMIP6.input4MIPs.PNNL-JGCRI.CEDS-2017-08-30":{ "institution_id":"PNNL-JGCRI", "source_id":"CEDS-2017-08-30", "mip_table":["CMIP"], "data_type":"emissions", "version":"2017-08-30", "VersionInfo":"current", "VersionNotes":"latest AIR datasets are 2017-08-30 (except SO2), and SO2 aircraft emission files 2017-10-05, which deprecate 2017-05-18", "doi":"10.22033/ESGF/input4MIPs.1604" }, "CMIP6.input4MIPs.PNNL-JGCRI.CEDS-2017-08-30-supplemental-data":{ "institution_id":"PNNL-JGCRI", "source_id":"CEDS-2017-08-30-supplemental-data", "mip_table":["CMIP"], "data_type":"emissions", "version":"2017-08-30-supplemental-data", "VersionInfo":"current", "VersionNotes":"latest AIR datasets are 2017-08-30 (except SO2), and SO2 aircraft emission files 2017-10-05, which deprecate 2017-05-18", "doi":"10.22033/ESGF/input4MIPs.1605" }, "CMIP6.input4MIPs.PNNL-JGCRI.CEDS-2017-10-05":{ "institution_id":"PNNL-JGCRI", "source_id":"CEDS-2017-10-05", "mip_table":["CMIP"], "data_type":"emissions", "version":"2017-10-05", "VersionInfo":"current", "VersionNotes":"latest AIR datasets are 2017-08-30 (except SO2), and SO2 aircraft emission files 2017-10-05, which deprecate 2017-05-18", "doi":"" }, "CMIP6.input4MIPs.PNNL-JGCRI.CEDS-v2016-06-18":{ "institution_id":"PNNL-JGCRI", "source_id":"CEDS-v2016-06-18", "mip_table":["CMIP"], "data_type":"emissions", "version":"2016-06-18", "VersionInfo":"deprecated", "VersionNotes":"...to be added: reason for deprecation...", "doi":"10.22033/ESGF/input4MIPs.1123" }, "CMIP6.input4MIPs.PNNL-JGCRI.CEDS-v2016-06-18-sectorDimV2":{ "institution_id":"PNNL-JGCRI", "source_id":"CEDS-v2016-06-18-sectorDimV2", "mip_table":["CMIP"], "data_type":"emissions", "version":"2016-06-18-sectorDimV2", "VersionInfo":"deprecated", "VersionNotes":"...to be added: reason for deprecation...", "doi":"10.22033/ESGF/input4MIPs.1126" }, "CMIP6.input4MIPs.PNNL-JGCRI.CEDS-v2016-07-26":{ "institution_id":"PNNL-JGCRI", "source_id":"CEDS-v2016-07-26", "mip_table":["CMIP"], "data_type":"emissions", "version":"2016-07-26", "VersionInfo":"deprecated", "VersionNotes":"...to be added: reason for deprecation...", "doi":"10.22033/ESGF/input4MIPs.1116" }, "CMIP6.input4MIPs.PNNL-JGCRI.CEDS-v2016-07-26-sectorDim":{ "institution_id":"PNNL-JGCRI", "source_id":"CEDS-v2016-07-26-sectorDim", "mip_table":["CMIP"], "data_type":"emissions", "version":"2016-07-26-sectorDim", "VersionInfo":"deprecated", "VersionNotes":"...to be added: reason for deprecation...", "doi":"10.22033/ESGF/input4MIPs.1114" }, "CMIP6.input4MIPs.PNNL-JGCRI.CEDS-v2016-07-26-sectorDim-supplemental-data":{ "institution_id":"PNNL-JGCRI", "source_id":"CEDS-v2016-07-26-sectorDim-supplemental-data", "mip_table":["CMIP"], "data_type":"emissions", "version":"2016-07-26-sectorDim-supplemental-data", "VersionInfo":"deprecated", "VersionNotes":"...to be added: reason for deprecation...", "doi":"10.22033/ESGF/input4MIPs.1124" } } }, "version":"6.2.1_ms", "version_release":"2017-11-01" } }

Doi information is accessible as JSON using the above DRS_ids via: https://cera-www.dkrz.de/WDCC/meta/CMIP6/.json As we currently have several different DRS in use, an example for the citation json is available via: https://cera-www.dkrz.de/WDCC/meta/CMIP6/input4MIPs.PNNL-JGCRI.emissions.CMIP.CEDS-2017-08-30.json Note that in the non-doi case you will find the url of the landing page in the json instead of the doi.

durack1 commented 6 years ago

@MartinaSt thanks for this, the citation was not the target for the existing format so I'll have to consider merging these both.

@davidhassell @eguil it would be really useful for you to chime in, as once a format has been settled you'll have to deal with this anyway you can

agstephens commented 6 years ago

At what level of the DRS hierarchy are we planning to publish DOIs? If we want to auto-generate version history files then it is important to know which level of the directory structure they apply to.

esdoc-system-user commented 6 years ago
  1. Be consistent with key naming convention, i.e. either lower_case_underscore (ala python) or camelCase (ala native JSON).

  2. The data field should be an array not an object.

MartinaSt commented 6 years ago

@agstephens You can see the citation granularity, which is in use for input4MIPs, in my example. I have used the DRS_id on the citation granularity as key. E.g.: old DRS: input4MIPs.PNNL-JGCRI.emissions.CMIP.CEDS-2017-08-30 new DRS: CMIP6.input4MIPs.PNNL-JGCRI.CEDS-2017-08-30

durack1 commented 6 years ago

@esdoc-system-user your comment above "data field should be an array not an object", can you further explain? The current file version/format can be viewed here

davidhassell commented 6 years ago

It may be some use to summarize how ES-DOC will be storing dataset descriptions. The properties we might collect are (summarized from the CIM definition)

Apart from name, all properties are optional.

Currently, the name, availability and description are captured in the ES-DOC CMIP6 experiments spreadsheet, that is rendered in the ES-DOC viewer (https://search.es-doc.org , e.g. the descriptions of pre-industrial aerosols for esm-piControl may be see here)

durack1 commented 6 years ago

@davidhassell thanks for this, I believe these properties are exactly what I was hoping to gather, so will consider these along with the requirements outlined by @MartinaSt above https://github.com/PCMDI/input4MIPs_CVs/issues/2 and propose a new format before preparing the 6.0.0 -> 6.2.3 version json files

MartinaSt commented 5 years ago

@durack1, independent of the format for the version information the current information is outdated (version 6.2.3, November 2017) When do you plan an update?

durack1 commented 5 years ago

Hi folks, as discussed at the WIP call this morning we need to work on the input4MIPs dataset version history so that this information can be provided for model simulations to be accurately documented (which combination of the numerous forcing datasets available). It would be useful for @davidhassell @charliepascoe to engage on this so that we can generate an easy to use format that can be updated live as additional datasets are updated and contributed to the project

@eguil @momipsl @taylor13 @MartinaSt

durack1 commented 5 years ago

It will be necessary to assign the input4MIPs collection version - currently 6.2.14 (see here) for each valid dataset, and as these datasets are deprecated their version remains static, with the new version getting the new collection tag, so e.g. a new volcanic forcing dataset (v4) is released, the input4MIPs collection is incremented to 6.2.15, in the 6.2.14.json file the v3 file had collection version = 6.2.14, in the 6.2.15.json file the v3 file continues to have collection version 6.2.14, whereas v4 will have 6.2.15

MartinaSt commented 5 years ago

@durack1 thanks for coming back to this issue. Please keep either the version information, which data providers included in the dataset names, or/and add the ESGF version under which the dataset was published. Otherwise I will loose the connection to the dataset version in the citation.

durack1 commented 5 years ago

@eguil @davidhassell @momipsl this is the conversation that we can hopefully spend some time finalizing tomorrow - the format that @MartinaSt suggested is above https://github.com/PCMDI/input4MIPs_CVs/issues/2

davidhassell commented 5 years ago

Hi @durack1 and all, Thanks for taking the time to talk through this a couple of days ago.

To summarize, these are the attributes in the JSON files that I think we can use for ES-DOC:

I understand that all of these items are readily available. Of course any extra attributes that are needed, e.g. for citations are all fine and will not affect ES-DOC.

This could be, more or less, a mingling of Martina's and Paul's JSON examples:

{
"data":{
"CMIP6.input4MIPs.ImperialCollege.ImperialCollege-1-0":{
"institution_id":"ImperialCollege",
"source_id":"ImperialCollege-1-0",
"mip_table":["C4MIP","OMIP"],
"data_type":"atmosphericState",
"version":"1.0",
"VersionInfo":"deprecated",
"VersionNotes":"...to be added: reason for deprecation...",
"doi":"10.22033/ESGF/input4MIPs.1162",
"Title": "Compiled Historical Record of Atmospheric delta13CO2 version 1.1",
"id": "input4MIPs.CMIP6.C4MIP.ImperialCollege.ImperialCollege-1-1.atmos.yr.delta13co2-in-air.gm",
"currentVersion":["1.1",
                           "2.0"],
"deprecatedVersion": ["1.0"]
},

Thanks, David

MartinaSt commented 5 years ago

@durack1 , @davidhassell : Following today's discussion in the input4MIPs meeting, I propose that we add an attribute "VersionLink", which enables to link to a document (PDF) with a detailed description of the issue with the dataset version.

durack1 commented 5 years ago

@MartinaSt just circling back around on this. Is there an API to call to query the DOIs issued by the DKRZ citation service? As the archive has grown so much now, I'm reluctant to try and hand-spin this versioning information.

I'm looking into harvesting all the metadata attributes from the ESGF project so I can populate all the fields comprehensively

MartinaSt commented 4 years ago

@durack1 : I'd like to come back to this version documentation issue. As the errata are not accessible for the data citation (an access by DRS CV is required) the revised version information is the only possibility to access+display version/errata information on the DOI landing page.

What are your plans with this version documentation? Any idea about a schedule?

durack1 commented 4 years ago

@MartinaSt thanks for bringing this back up. I started realizing that doing this in a manual way will cause all manner of problems, so rather using the ESGF API, we should be able to reconstruct these versions using date ranges. I am hoping to get some help with this, and I know that it's going to be required by ES-DOC (@davidhassell) to make it easy for modelling groups to document their simulations

durack1 commented 4 years ago

@mauzey1 this was the issue I was hoping to get some help with. Your experience extracting info out of the ESGF API for CMIP6 would be perfect experience to do something similar with input4MIPs

mauzey1 commented 4 years ago

@durack1 I have been investigating how to build JSON files like https://github.com/PCMDI/input4MIPs-cmor-tables/blob/master/Versions/6.2.3.json using current data from search on CoG. I am getting this data by reading all versions in the input4MIPs database. Here is an example of the data I have been collecting.

        "PCMDI": {
            "SSTsAndSeaIce": {
                "1.1.0": {
                    "deprecated": "True"
                },
                "1.1.1": {
                    "deprecated": "false"
                },
                "1.1.2": {
                    "deprecated": "false"
                },
                "1.1.3": {
                    "deprecated": "False"
                },
                "1.1.4": {
                    "deprecated": "true"
                },
                "1.1.5": {
                    "deprecated": "true"
                },
                "1.1.6": {
                    "deprecated": "False"
                }
            }
        }

There seems to be inconsistencies with this data and the values stored in the current JSON files. https://github.com/PCMDI/input4MIPs-cmor-tables/blob/07d9bca11ad76b72bfa8e8ee6514e88d986d60c0/Versions/6.2.3.json#L27-L40 In the search results, version 1.1.1 shows up as not being deprecated.

durack1 commented 4 years ago

@mauzey1 this is great, thanks for starting to look at this!

Forget about the information contained in the Versions/ subdir, these were hand spun many many months ago and are now completely out of date.

As a backstory, we have the versions defined in a google doc, see here and my hope is that we can harvest the information from the archive on each of the dates (and versions) specified and collate a versions status file following the template above https://github.com/PCMDI/input4MIPs_CVs/issues/2, which we'll need to finalize once we can ascertain what is available from the API

mauzey1 commented 4 years ago

@durack1 Do we want entries to be organized by institution_id and source_id, and contain every DRS id of the datasets in each entry? For example, there two ids for the datasets with the institution_id of ImperialCollege and source_id of ImperialCollege-1-0.

    "ImperialCollege": {
        "ImperialCollege-1-0": {
            "input4MIPs.CMIP6.C4MIP.ImperialCollege.ImperialCollege-1-0.atmos.yr.Delta14co2-in-air.gz": {
                "deprecated": "True"
            },
            "input4MIPs.CMIP6.C4MIP.ImperialCollege.ImperialCollege-1-0.atmos.yr.delta13co2-in-air.gm": {
                "deprecated": "True"
            }
        }
    }

Here is an example of ids from institution_id MRI and source_id MRI-JRA55-do-1-4-0, which has 21 ids.

    "MRI": {
        "MRI-JRA55-do-1-4-0": {
            "input4MIPs.CMIP6.OMIP.MRI.MRI-JRA55-do-1-4-0.atmos.3hr.prra.gr": {
                "deprecated": "False"
            },
            "input4MIPs.CMIP6.OMIP.MRI.MRI-JRA55-do-1-4-0.atmos.3hr.prsn.gr": {
                "deprecated": "False"
            },
            "input4MIPs.CMIP6.OMIP.MRI.MRI-JRA55-do-1-4-0.atmos.3hr.rlds.gr": {
                "deprecated": "False"
            },
            "input4MIPs.CMIP6.OMIP.MRI.MRI-JRA55-do-1-4-0.atmos.3hr.rsds.gr": {
                "deprecated": "False"
            },
            "input4MIPs.CMIP6.OMIP.MRI.MRI-JRA55-do-1-4-0.atmos.3hrPt.huss.gr": {
                "deprecated": "False"
            },
            "input4MIPs.CMIP6.OMIP.MRI.MRI-JRA55-do-1-4-0.atmos.3hrPt.psl.gr": {
                "deprecated": "False"
            },
            "input4MIPs.CMIP6.OMIP.MRI.MRI-JRA55-do-1-4-0.atmos.3hrPt.tas.gr": {
                "deprecated": "False"
            },
            "input4MIPs.CMIP6.OMIP.MRI.MRI-JRA55-do-1-4-0.atmos.3hrPt.ts.gr": {
                "deprecated": "False"
            },
            "input4MIPs.CMIP6.OMIP.MRI.MRI-JRA55-do-1-4-0.atmos.3hrPt.uas.gr": {
                "deprecated": "False"
            },
            "input4MIPs.CMIP6.OMIP.MRI.MRI-JRA55-do-1-4-0.atmos.3hrPt.vas.gr": {
                "deprecated": "False"
            },
            "input4MIPs.CMIP6.OMIP.MRI.MRI-JRA55-do-1-4-0.atmos.fx.areacella.gr": {
                "deprecated": "False"
            },
            "input4MIPs.CMIP6.OMIP.MRI.MRI-JRA55-do-1-4-0.atmos.fx.sftof.gr": {
                "deprecated": "False"
            },
            "input4MIPs.CMIP6.OMIP.MRI.MRI-JRA55-do-1-4-0.land.day.friver.gr": {
                "deprecated": "False"
            },
            "input4MIPs.CMIP6.OMIP.MRI.MRI-JRA55-do-1-4-0.landIce.day.licalvf.gr": {
                "deprecated": "False"
            },
            "input4MIPs.CMIP6.OMIP.MRI.MRI-JRA55-do-1-4-0.ocean.day.tos.gn": {
                "deprecated": "False"
            },
            "input4MIPs.CMIP6.OMIP.MRI.MRI-JRA55-do-1-4-0.ocean.fx.areacello.gr": {
                "deprecated": "False"
            },
            "input4MIPs.CMIP6.OMIP.MRI.MRI-JRA55-do-1-4-0.ocean.monC.sos.gr": {
                "deprecated": "False"
            },
            "input4MIPs.CMIP6.OMIP.MRI.MRI-JRA55-do-1-4-0.ocean.yrC.uos.gr": {
                "deprecated": "False"
            },
            "input4MIPs.CMIP6.OMIP.MRI.MRI-JRA55-do-1-4-0.ocean.yrC.vos.gr": {
                "deprecated": "False"
            },
            "input4MIPs.CMIP6.OMIP.MRI.MRI-JRA55-do-1-4-0.seaIce.3hrPt.siconca.gr": {
                "deprecated": "False"
            },
            "input4MIPs.CMIP6.OMIP.MRI.MRI-JRA55-do-1-4-0.seaIce.day.siconc.gn": {
                "deprecated": "False"
            }
        }
    }
durack1 commented 4 years ago

@MartinaSt, just trying to figure out how granular we should be? In most instances we'll want to be file specific (as some files have changed their version without a dataset collection), but you know the citation granularity, so trying to figure out the level of detail needs some iteration. @mauzey1 has managed to extract the above from the project already.

@davidhassell @taylor13 ping

MartinaSt commented 4 years ago

@durack1, my use for this version information is to inform users about errors on the DOI granularity, which is for @mauzey1's example: 'input4MIPs.CMIP6.OMIP.MRI.MRI-JRA55-do-1-4-0'. Thus what I need is:

Apart from that, you had the idea to include the doi in the version information as well. If we still want to do that, that would define the granularity for the version information.

durack1 commented 4 years ago

@mauzey1 it'd be great to get back to this and get it done. What did you need from me to get this finalized?

mauzey1 commented 4 years ago

@durack1 input4MIPs_report.txt Here is the latest table I have made from input4MIPs. Below is an excerpt of the table.

...
        "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-2": {
            "institutionId": "PCMDI",
            "sourceId": "PCMDI-AMIP-1-1-2",
            "mipTable": "CMIP",
            "datatype": "SSTsAndSeaIce",
            "version": "1.1.2",
            "id": {
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-2.ocean.fx.areacello.gn.v20170419": "latest",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-2.ocean.fx.sftof.gn.v20170419": "latest",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-2.ocean.mon.tos.gn.v20170419": "latest",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-2.ocean.mon.tosbcs.gn.v20170419": "latest",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-2.seaIce.mon.siconc.gn.v20170419": "latest",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-2.seaIce.mon.siconcbcs.gn.v20170419": "latest"
            },
            "doi": "10.22033/ESGF/input4MIPs.1161",
            "title": "PCMDI AMIP SST and sea-ice boundary conditions version 1.1.2"
        },
        "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-3": {
            "institutionId": "PCMDI",
            "sourceId": "PCMDI-AMIP-1-1-3",
            "mipTable": "CMIP",
            "datatype": "SSTsAndSeaIce",
            "version": "1.1.3",
            "id": {
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-3.ocean.fx.areacello.gn.v20171031": "latest",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-3.ocean.fx.sftof.gn.v20171031": "latest",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-3.ocean.mon.tos.gn.v20171031": "latest",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-3.ocean.mon.tosbcs.gn.v20171031": "latest",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-3.seaIce.mon.siconc.gn.v20171031": "latest",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-3.seaIce.mon.siconcbcs.gn.v20171031": "latest"
            },
            "doi": "10.22033/ESGF/input4MIPs.1735",
            "title": "PCMDI AMIP SST and sea-ice boundary conditions version 1.1.3"
        },
        "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-4": {
            "institutionId": "PCMDI",
            "sourceId": "PCMDI-AMIP-1-1-4",
            "mipTable": "CMIP",
            "datatype": "SSTsAndSeaIce",
            "version": "1.1.4",
            "id": {
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-4.ocean.fx.areacello.gn.v20180427": "deprecated",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-4.ocean.fx.sftof.gn.v20180427": "deprecated",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-4.ocean.mon.tos.gn.v20180427": "deprecated",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-4.ocean.mon.tosbcs.gn.v20180427": "deprecated",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-4.seaIce.mon.siconc.gn.v20180427": "deprecated",
                "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-4.seaIce.mon.siconcbcs.gn.v20180427": "deprecated"
            },
            "doi": "10.22033/ESGF/input4MIPs.2204",
            "title": "PCMDI AMIP SST and sea-ice boundary conditions version 1.1.4"
        },
...

Each entry has the activity_id, mip_era, target_mip_list, institution_id, source_id, source_version, and dataset_category of a group of datasets. Each group of activity_id, mip_era, target_mip, institution_id, and source_id are used to get a DOI from https://cera-www.dkrz.de/WDCC/ui/cerasearch/cerarest/exportcmip6. Each entry has the list of dataset IDs along with their dataset_status (deprecated/latest/None).

This table was made with this Python script: https://github.com/mauzey1/esgf-utils/blob/d1e4215fd36ffa67f3a46bba7a2cd324ce5121b2/update-reports/input4MIPs_report.py

durack1 commented 4 years ago

@MartinaSt @davidhassell we should really circle around on this so we can finalize the forcing versioning json and you guys can start using it. How does the above format https://github.com/PCMDI/input4MIPs_CVs/issues/2 look?

MartinaSt commented 4 years ago

@durack1, thanks for getting this forward.

I use this information to document version and error information on the DOI landing pages. Therefore I need information on (see above):

durack1 commented 4 years ago

@MartinaSt, it seems the above includes such information, so e.g.

version/versionInfo: "id": { "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-3.ocean.fx.areacello.gn.v20171031": "latest", ... "id": { "input4MIPs.CMIP6.CMIP.PCMDI.PCMDI-AMIP-1-1-4.ocean.fx.areacello.gn.v20180427": "deprecated",

Regarding the versionNotes, what format would you want for this, a drop-down selection, or free-form text? Just wondering the use case and whether a char limit etc is required.

@davidhassell please chime in now, as once this format is set it's not likely to be revisited and may require ES-DOCs to harvest information separate to this version json data

MartinaSt commented 4 years ago

@durack1 Sorry, I did not scroll to the right to see the deprecated information. Regarding the errata information: The content is up to the data creators, so free text. The important part for me is that I can show a reliable and meaningful errata information for deprecated data or the reason for deprecation on the DOI landing page.

durack1 commented 4 years ago

@MartinaSt no problem, so how about if we have a situation where two datasets are latest? With the PCMDI-AMIP-X-Y-Z data, the 1.1.3 (from memory) was the official CMIP6 release, whereas normally a 6 monthly update is released, which deprecates the previous version (but not 1.1.3 which will always be available as "latest"). Is such logic a problem?

Are there any other considerations we need to factor in whilst finalizing the format?

MartinaSt commented 4 years ago

@durack1 yes, the errata might be related to some but not all datasets. The "versionNote" is directly related to the "deprecated/latest/None" information and thus including it breaks the proposed json.

Maybe we can assume that for such a case, tat there is only one reason for the deprecation of some (but not all) datasets and include it in the upper level alongside "version"?

durack1 commented 4 years ago

@MartinaSt well how about we proceed this way, we'll work to generate the 6.2.37 version of the json, review this and once we have a finalized format generate all the previous versions back to initial 6.0.0 (20th December 2016) release, sound good?

MartinaSt commented 4 years ago

@durack1 Any suggestion to get this finalized is a good one! So from my view: Go ahead!

Just as two comments: I will wait for the final format before I do any code changes and I will do the change if the version includes the errata information for the users (otherwise the deprecation flag does not add much information to the already available version, which is part of the DRS). It's a matter of spending my time most efficiently on the different projects I am involved in...

durack1 commented 4 years ago

@MartinaSt completely understood, and agree (spending valuable time appropriately).

@mauzey1 is addressing a number of high priority requirements, and this is in the queue after these, so I'd hope we can get the latest version json finalized first, then we can double check it contains everything in the format required and then roll back to the start.

There are a couple of new datasets that have started to appear for review, so 6.2.37 will be incrementing over the coming months

durack1 commented 3 years ago

@MartinaSt @mauzey1 is working on this as a second priority to the CMIP publication page. He has already generated the attached, and so we'll need to tweak this format to get to the finish line input4MIPs_report.json.txt

MartinaSt commented 3 years ago

@durack1 @mauzey1 Ok, now I am on the right page. Thanks for the JSON and your effort. It looks good except for:

Sorry, to be persistent but the most important information for me is to get errata information or in other words to have the reason for deprecation in the JSON for the "deprecated" cases. Is it possible to add such an information to the JSON?

durack1 commented 3 years ago

@MartinaSt yes sorry for my loose comments. Yeah that should be possible, so if a dataset is "deprecated", we could add an additional field such as deprecationNotes with a brief description. Do you have any suggestion regarding character counts etc, or you have no limitations?

MartinaSt commented 3 years ago

Thanks @durack1 ! I am not aware of a hard character limitation on my end. But Brief is certainly good.

mauzey1 commented 3 years ago

@durack1 The ESGF database only provides whether or not a dataset was deprecated; it does not provide any notes about why it was deprecated. How will we get this information? Would we just contact the people who published the datasets for the reason for deprecation, and manually add it to the rest of the information?

durack1 commented 3 years ago

@mauzey1 thanks for circling around on this. I have this information, so let me know where this should be put, so we can integrate it

mauzey1 commented 3 years ago

@durack1 Is there a github repo where you could store that information? It would make that info more accessible and easy to update.