CatalogueOfLife / backend

Complete backend of COL ChecklistBank
Apache License 2.0
15 stars 11 forks source link

How to craft DOIs for project, releases & sources #990

Closed mdoering closed 3 years ago

mdoering commented 3 years ago

For COL and other projects we want to issue GBIF DataCite DOIs for

  1. each release
  2. each source within a release
  3. a single DOI for the overarching project similar to the Zenodo DOI versioning scheme

Tasks:

Influence the metadata model work to provide all information needed to build the datacite document.

mdoering commented 3 years ago

There are only 14 BibTex entry types targeting mostly classic publications. See also Wikipedia on BibTex.

https://libguides.nps.edu/citation/ieee-bibtex has examples for blogs, digital Book in a Series and also a database example using the electronic type which is not part of the 14 types listed above:

@electronic{nsa_ipac_2012,
  author = "{NASA/IPAC Extragalactic Database}",
  organization = "Object name IRAS F00400+4059",
  url = "http://nedwww.ipac.caltech.edu/",
  note = "Accessed Dec. 10, 2012"
}

biblatex provides the type online, electronic is an alias.

mdoering commented 3 years ago

The IOC Bird List issues DOIs for every release (2 per year): https://www.worldbirdnames.org/ioc-lists/crossref/ E.g. https://doi.org/10.14344/IOC.ML.11.1

BibTex

curl -LH "Accept: application/x-bibtex" https://doi.org/10.14344/IOC.ML.11.1
@misc{1,
    doi = {10.14344/ioc.ml.11.1},
    url = {https://doi.org/10.14344%2Fioc.ml.11.1},
    publisher = {World Bird Names International Ornithologists Union},
    title = {{IOC} World Bird List 11.1}
}

CSL JSON

curl -LH "Accept: application/json" https://doi.org/10.14344/IOC.ML.11.1
curl -LH "Accept: application/vnd.citationstyles.csl+json" https://doi.org/10.14344/IOC.ML.11.1
{
    "DOI": "10.14344/ioc.ml.11.1",
    "URL": "http://dx.doi.org/10.14344/IOC.ML.11.1",
    "container-title": "IOC World Bird List Datasets",
    "content-domain": {
        "crossmark-restriction": false,
        "domain": []
    },
    "created": {
        "date-parts": [
            [
                2021,
                1,
                29
            ]
        ],
        "date-time": "2021-01-29T14:55:51Z",
        "timestamp": 1611932151000
    },
    "deposited": {
        "date-parts": [
            [
                2021,
                1,
                29
            ]
        ],
        "date-time": "2021-01-29T14:55:52Z",
        "timestamp": 1611932152000
    },
    "indexed": {
        "date-parts": [
            [
                2021,
                2,
                22
            ]
        ],
        "date-time": "2021-02-22T02:44:00Z",
        "timestamp": 1613961840710
    },
    "institution": {
        "acronym": [
            "IOC"
        ],
        "name": "World Bird Names",
        "place": [
            "-"
        ]
    },
    "is-referenced-by-count": 1,
    "issued": {
        "date-parts": [
            [
                null
            ]
        ]
    },
    "member": "5466",
    "original-title": [],
    "prefix": "10.14344",
    "publisher": "World Bird Names International Ornithologists Union",
    "reference-count": 0,
    "references-count": 0,
    "relation": {},
    "score": 1.0,
    "short-title": [],
    "source": "Crossref",
    "subtitle": [],
    "title": "IOC World Bird List 11.1",
    "type": "dataset"
}

RIS

curl -LH "Accept: application/x-research-info-systems" https://doi.org/10.14344/IOC.ML.11.1
TY  - DATA
DO  - 10.14344/ioc.ml.11.1
UR  - http://dx.doi.org/10.14344/IOC.ML.11.1
TI  - IOC World Bird List 11.1
T2  - IOC World Bird List Datasets
PB  - World Bird Names International Ornithologists Union
ER  - 

RDF/Turtle

curl -LH "Accept: text/turtle" https://doi.org/10.14344/IOC.ML.11.1
<http://dx.doi.org/10.14344/IOC.ML.11.1>
      <http://prismstandard.org/namespaces/basic/2.1/doi>
              "10.14344/ioc.ml.11.1" ;
      <http://purl.org/dc/terms/date>
              ""^^<http://www.w3.org/2001/XMLSchema#gYear> ;
      <http://purl.org/dc/terms/identifier>
              "10.14344/ioc.ml.11.1" ;
      <http://purl.org/dc/terms/publisher>
              "World Bird Names International Ornithologists Union" ;
      <http://purl.org/dc/terms/title>
              "IOC World Bird List 11.1" ;
      <http://purl.org/ontology/bibo/doi>
              "10.14344/ioc.ml.11.1" ;
      <http://www.w3.org/2002/07/owl#sameAs>
              <doi:10.14344/ioc.ml.11.1> , <http://dx.doi.org/10.14344/ioc.ml.11.1> , <info:doi/10.14344/ioc.ml.11.1> .

RDF/XML

curl -LH "Accept: application/rdf+xml" https://doi.org/10.14344/IOC.ML.11.1
<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:j.0="http://purl.org/dc/terms/"
    xmlns:j.1="http://prismstandard.org/namespaces/basic/2.1/"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:j.2="http://purl.org/ontology/bibo/">
  <rdf:Description rdf:about="http://dx.doi.org/10.14344/IOC.ML.11.1">
    <j.0:publisher>World Bird Names International Ornithologists Union</j.0:publisher>
    <j.0:title>IOC World Bird List 11.1</j.0:title>
    <j.2:doi>10.14344/ioc.ml.11.1</j.2:doi>
    <j.1:doi>10.14344/ioc.ml.11.1</j.1:doi>
    <j.0:date rdf:datatype="http://www.w3.org/2001/XMLSchema#gYear"
    ></j.0:date>
    <owl:sameAs rdf:resource="info:doi/10.14344/ioc.ml.11.1"/>
    <owl:sameAs rdf:resource="doi:10.14344/ioc.ml.11.1"/>
    <j.0:identifier>10.14344/ioc.ml.11.1</j.0:identifier>
    <owl:sameAs rdf:resource="http://dx.doi.org/10.14344/ioc.ml.11.1"/>
  </rdf:Description>
</rdf:RDF>
mdoering commented 3 years ago

See also https://github.com/gbif/gbif-doi/issues/20 for considering a dedicated COL DOI prefix.

mdoering commented 3 years ago

The primary URL a COL release points cannot be the main COL portal as this changes with every release. We need a stable URL for a specific version. Ultimately this can only be ChecklistBank.

But I would propose to create a stable URL in the main portal, similar to how we dealt with the previous annual releases, that can either forward redirect directly to ChecklistBank. Stable URLs would then only need to be managed for the portal and give us freedom to change the much more complex CLB application as seems useful. We could then also easily host tombstone pages for monthly releases that have been removed from ChecklistBank. And maybe better integrate also release notes such as https://preview.catalogueoflife.org/2021/04/05/release

We could even consider to continue with the previous scheme for annual releases: https://www.catalogueoflife.org/annual-checklist/2021

And do sth similar for the monthly ones: https://www.catalogueoflife.org/monthly-checklist/2021-04

But maybe it is better to use a new structure like this which does not differ between annual and monthly when the only difference between them is the longevity of the data: https://www.catalogueoflife.org/release/2021-04 https://www.catalogueoflife.org/checklist/2021-04

mdoering commented 3 years ago

moved this to a separate discussion under https://github.com/CatalogueOfLife/portal/issues/139

mdoering commented 3 years ago

ISSN strongly recommends to create a title DOI to represent the journal, which in our case would represent the COL series and thus an equivalent to the overarching Zenodo conceptual DOI, number 3 of the top issue description.

Such a DOI would use the ISSN number as part of the DOI, i.e. https://doi.org/10.15468/issn.2405-8858 for the current GBIF DOI prefix and the COL ISSN

mdoering commented 3 years ago

Checklist of the New Zealand flora from Landcare also uses an ISSN.

mdoering commented 3 years ago

Some rules for issuing DOIs for new releases:

dremsen commented 3 years ago

A consideration for who is formally recognized via DOI resolutions and who is not. based on ISG meeting 4/27/21 (Pyle, Ower, Döring, Remsen)

Contributors to the COL monthly and annual checklists fall into a wide range of categories or roles and considerable effort has been made to identify and define them. Key use cases behind this are

  1. To motivate data contributors to provide updated content to the COL because it improves their professional visibility.
  2. To improve the quality of the Catalogue of Life.

These finer contributor divisions may ultimately be integrated into an updated COLDP metadata format, providing an improved degree of visibility. Ultimately, however, all of these roles must be assigned to one of two primary divisions:

  1. those who are formally recognized via the resolution of a DOI with all of the technical advantages of linkages through larger network graph services that enable automated linking and metrics
  2. those who are not part of a formal metadata resolution system and rely on human eyes to read a set of "credits" on a COL page/metadata document.

An important task for the COL Team is to determine which category of contributors fall into these two divisions. While there may be a temptation to simply include everyone into this category, it isn't so simple. Pyle likens the division between contributors to traditional journal publications being divided into Authors vs mention in Acknowledgments. Given the limitations of the DataCite resolution format, assignment to the enriched DOI resolution pathway relegates all contributors to a singular content creator / author class. Giving all contributors equal weight may de-motivate the GSD content contributors who perform the majority of the data assembly and curation on the contributing datasets.

Actions required are to:

  1. consider all the roles of COL contributors
  2. determine which contributors (by role) should be linked to DOI resolution and receive all the benefits of wider service tracking
  3. determine which contributors should be less formally acknowledged
dremsen commented 3 years ago

A consideration on assignment of DOIs to Catalogue of Life data sources based on ISG meeting 4/27/21 (Pyle, Ower, Döring, Remsen)

The Catalogue of Life proposal to assign DOIs to each release and to each source (contributing dataset) within a release presents some options for consideration. Discussion is needed to determine the optimal criteria for minting a new DOI for a COL contributing dataset. A current proposal is to assign a new set of DOIs each month to

  1. all contributing datasets. Datasets, in this case, refers to the instance of a curated dataset that is part of a release edition. It is NOT a reference to the source of the dataset in ChecklistBank
  2. to the full COL checkist release.

These two DOI categories would include complementary reference metadata with:

  1. Resolution of dataset DOI providing a reference to the COL edition to which it contributes.
  2. Resolution of the COL checklist DOI providing a list of references to all the contributing data sources.

Thus, if the COL is composed of 100 contributing data sectors, there would be 101 DOI's minted each month or 1212 DOI's minted per year. This profusion of DOIs could be a point of confusion for some users, particularly when many data sources remain unchanged for months or years. The only real difference in the resolved metadata, in many cases, will be simple edition increments. Does this justify the approach.

Another option we explore is to only mint identifiers for data sources when they have been updated (changed). For example, assume the data source for FishBase was changed in April of 2018, the first update since October of 2012. We would propose:

  1. The DOI for the March 2018 release of the COL would reference the DOI of the older, October 2012 FishBase Release.
  2. The DOI for the October 2012 FishBase data source would be updated by appending an IsReferencedBy (or similar) property citing its contribution to March 2018 COL. The dataset DOI would list all COL editions to which it has contributed.
  3. The DOI for the April 2018 COL monthly checklist would reference the newly minted DOI for the April 2018 Fishbase release. It may even explicitly reference the deprecation of the former FishBase version (IsObsoletedBy property). It will also list all of the other contributing datasets by their last updated DOI.
  4. The DOI for the April 2018 FishBase release would list the April 2018 COL monthly release. In subsequent months it will also cite those COL editions to which it has contributed until, it too, is replaced.
mdoering commented 3 years ago

Actions required are to:

consider all the roles of COL contributors

Please see https://github.com/CatalogueOfLife/backend/issues/1001 for a dedicated issue on contributor roles and reopen it if needed. Personally I think the way we deal with official roles in the updated model now is very well.

determine which contributors (by role) should be linked to DOI resolution and receive all the benefits of wider service tracking

This also defines who is listed in the citation string, i.e. who is part of the official authorship of COL. The global team has recommended to include each and everyone, but due to arguments layed out by @dremsen above we should consider to only include authors that actually did work that helped to change the COL in that very release. An option to consider would be to always include the core team, but only those GSD authors with updated content.

determine which contributors should be less formally acknowledged

That would be all other GSD authors then I suppose.

mdoering commented 3 years ago

DataCite offers many resource relationTypes in Table 9, page 46. We should consider to use:

In case we would adopt redundant DOIs each month there is an option to relate the DOIs pointing to the same source version with IsIdenticalTo, but I think we should avoid creating redundant DOIs in the first place.

Not that with serials and an existing ISSN DataCite recommends:

SeriesInformation: Information about a repeating series, such as volume, issue, number.

For use with grey literature. If providing an ISSN, use RelatedIdentifier, relatedIdentifierType=ISSN. For dataset series describe the relationships with isPartOf or HasPart.

<relatedIdentifier relatedIdentifierType="ISSN" relationType="IsPartOf">0077-5606 </relatedIdentifier>

mdoering commented 3 years ago

One other important question we discussed on the ISG call is what citation metadata a source DOI would return. Currently we follow the chapter in a book model with FishBase as the example:

Froese R., Pauly D. (eds.) (2021). FishBase: FishBase (version Feb 2018). In: Catalogue of Life, et al. (2021). Species 2000 & ITIS Catalogue of Life, 2021-04-05. Digital resource at www.catalogueoflife.org. Species 2000: Naturalis, Leiden, the Netherlands. ISSN 2405-8858.

Here In: Catalogue of Life, et al. (2021) is referring to a specific version of COL. If the FishBase DOI did not change between 2012 and 2018 we could either

mdoering commented 3 years ago

all of above relations but Cites and IsPreviousVersionOf are implemented.