WCRP-CMIP / CMIP6Plus_CVs

Controlled Vocabularies (CVs) for use in CMIP6Plus
Creative Commons Attribution 4.0 International
3 stars 4 forks source link

Data Citation aspects #21

Open MartinaSt opened 9 months ago

MartinaSt commented 9 months ago

I have had a look at all CVs. It is an improvement to CMIP6 and makes several aspects more explicit. The data citation could also be better integrated. This would reduce the additional information to be provided for data citation to author and contributor details. Some ideas:

  1. Define data citation granularities in a CV similar to the DRS CV with templates of the citation landing pages, e.g. in a data_citation.json:

    {
    "data_citation": {
         "fine_granularity": {
               "url_machine_example": "http://cera-www.dkrz.de/WDCC/meta/CMIP6/CMIP6.CMIP.MOHC.HadGEM3-GC31-LL.1pctCO2.json",
               "url_machine_template": "http://cera-www.dkrz.de/WDCC/meta/CMIP6/<mip_era>.<activity_id>.<institution_id>.<source_id>.<experiment_id>.json" 
               "url_human_example": "http://cera-www.dkrz.de/WDCC/meta/CMIP6/CMIP6Plus.CMIP.MOHC.HadGEM3-GC31-LL.1pctCO2",
               "url_human_template": "http://cera-www.dkrz.de/WDCC/meta/CMIP6/<mip_era>.<activity_id>.<institution_id>.<source_id>.<experiment_id>",
         } ,
         "coarse_granularity": {
               "url_machine_example": "http://cera-www.dkrz.de/WDCC/meta/CMIP6/CMIP6Plus.CMIP.MOHC.HadGEM3-GC31-LL.json",
               "url_machine_template": "http://cera-www.dkrz.de/WDCC/meta/CMIP6/<mip_era>.<activity_id>.<institution_id>.<source_id>.json" 
               "url_human_example": "http://cera-www.dkrz.de/WDCC/meta/CMIP6/CMIP6.CMIP.MOHC.HadGEM3-GC31-LL",
               "url_human_template": "http://cera-www.dkrz.de/WDCC/meta/CMIP6/<mip_era>.<activity_id>.<institution_id>.<source_id>"
         } ,
        "data_citation_guidelines":"http://bit.ly/2gBCuqM",  
    "Header": {
    "CV_collection_modified": "2022-09-05",
    "CV_collection_version": "6.3.0.0",
    "author": "Matt Mizielinski <matthew.mizielinski@metoffice.gov.uk>",
    "checksum": "md5: ebda6eafcf0aba1ed108d6051ef27662",
    "institution_id": "MOHC",
    "previous_commit": "To be added",
    "specs_doc": "v6.3.0 (link TBC)"
    }
    }
  2. Citation_urls are currently in the ESGF index but not accessible from the netCDF file. We should re-discuss whether the citation_url should become part of the global attributes and define them similar to further_info_url maybe adding a template. That would synchronize the information in the ESGF index with that in the netCDF file header. The currently available information in the ESGF index for every dataset is, e.g. citation_url = http://cera-www.dkrz.de/WDCC/meta/CMIP6/CMIP6.CMIP.MOHC.HadGEM3-GC31-LL.1pctCO2.r1i1p1f3.AERmon.abs550aer.gn.v20190620.json

  3. A reference to the data citation guidelines should be included in the terms of use (CMIP6_license.json) to synchronize the information on the DOI landing pages with the information in the netCDF file headers (example DOI landing page: http://cera-www.dkrz.de/WDCC/meta/CMIP6/CMIP6.CMIP.MOHC.HadGEM3-GC31-LL )

  4. Define title template to be used by CMOR and data citation (issue #7)

  5. Add funding details to source_id (issue #6).

Remark: The citation_urls are persistent. They resolve even if a DOI has never been issued, which might occur when the metadata remains incomplete (missing authors) or the dataset has never been published in ESGF. Therefore I would use these citation_urls rather than DOIs.

As it is unclear if and how data citation will be supported beyond CMIP6Plus, we have to decide how much effort we spend on the integration of the citation into the CVs at this stage.

durack1 commented 9 months ago

@MartinaSt thanks for this, and agreed, figuring out what else we can pull into CVs/project management is a very good idea indeed, so pull content out of the netcdf global attributes, and into an iterable registry. Everything you have noted above is worth considering, with the project_data_citation.json something that could be optional per project, if it is supported. Also agree that pulling information out of the ESGF index and into CVs makes sense, it could be possible to have a github action/cronjob that does this automatically - if we have a stable project ESGF index to interrogate.

Great ideas, we'll circle around on this once we've completed the drop-in and received all the feedback