Materials-Consortia / OPTIMADE

Specification of a common REST API for access to materials databases
https://optimade.org/specification
Creative Commons Attribution 4.0 International
82 stars 37 forks source link

Changing DOI to Handles in reference #353

Open tachyontraveler opened 3 years ago

tachyontraveler commented 3 years ago

General description of the Issue

Currently, the reference section assumes that the persistent identifiers for an Entry are always created using DOI. Since the DOI identifiers are just a subset of Handle identifiers, wouldn't it be better to use the keyword handle or identifier or persistent_id instead of the current keyword, that is doi?

Reason to support Handles

DOIs are significantly more expensive to implement compared to Handles for large databases. Lack of support for Handles makes the reference data less universal and biased toward a paywalled implementation of an otherwise cheaper-to-implement system

Proposed change

From

{
  "data": {
    "type": "references",
    "id": "Dijkstra1968",
    "attributes": {
      "authors": [
        {
          "name": "Edsger Dijkstra"
        }
      ],
      "doi": "10.1145/362929.362947"
    }
  }
}

To something like,

{
  "data": {
    "type": "references",
    "id": "Dijkstra1968",
    "attributes": {
      "authors": [
        {
          "name": "Edsger Dijkstra"
        }
      ],
      "persistent_id": "DOI:10.1145/362929.362947"
    }
  }
}

Related external links:

  1. Wiki article on Handle system
  2. DOI and Handles from doi.org
  3. Comparison of Persistent Identifier systems
ml-evs commented 3 years ago

I strongly support adding idea of linking out to persistence into the API; I would go one step further and suggest that we added some kind of persistent URL field to the general entry type.

Do you know of an existing standard for the kind of field you suggest (with the prefix DOI: as part of the data)? Or will we have to define our own? Would there be any disadvantage to just providing a resolvable HTTP URL for each pURL, i.e. "http://doi.org/10.1145/362929.362947" or "https://hdl.handle.net/20.1000/100"?

CasperWA commented 3 years ago

As the references' properties/fields are currently BibTeX one-to-one (see description of the references endpoint and entry type), this will introduce a shift away from that. I am fine with refining the references data model a bit more than "just use BibTeX", but I think it would demand a bit more discussion in general. As a solution, may I suggest to add it as an extra field instead of a replacement?

ml-evs commented 3 years ago

As the references' properties/fields are currently BibTeX one-to-one (see description of the references endpoint and entry type), this will introduce a shift away from that.

This isn't true for the url/doi fields we added, which are not part of bibtex (we link to a PDF from 1988!)

ml-evs commented 3 years ago

One easy alternative would be to bulk up our definitions to encompass BibJSON, itself based on bibtex http://okfnlabs.org/bibjson/ (I think this was discussed a while ago). BibJSON in fact includes an identifier field:

    "identifier": [{"type":"doi","id":"10.1186/1758-2946-3-47"}]
CasperWA commented 3 years ago

As the references' properties/fields are currently BibTeX one-to-one (see description of the references endpoint and entry type), this will introduce a shift away from that.

This isn't true for the url/doi fields we added, which are not part of bibtex (we link to a PDF from 1988!)

You're right. Sorry. I just checked the actual specification for BibTeX 👍 Edit: Although a doi recognized third-party key is listed here.

merkys commented 3 years ago

While supporting Handles would be nice, I oppose removal of doi field. This would be a major change in the specification, thus incompatible with OPTIMADE v1. If needed, a separate field(s) for Handles could be supported.

By the way, current specification already can support Handles. They can be supplied in URL form in url field, using a generic proxy server (prefixing Handle with https://hdl.handle.net).

What might be a shortcoming of the current OPTIMADE specification, is restricting one reference to one URL (DOIs are supposed to be unique per digital object, thus they are probably OK). However, the specification could be easily extended by allowing a list value for url field.

ztrautt commented 3 years ago

This is an important consideration for the OPTIMADE specification. While DOI and Handle are popular, there are a handful of other kinds of persistent identifiers. Here is a document of interest: http://doi.org/10.5334/dsj-2017-009

Although it is specific to schema.org JSON-LD representations, the following may be a useful discussion as you consider how to support diverse persistent identifiers in the OPTIMADE specification: https://github.com/ESIPFed/science-on-schema.org/blob/master/decisions/13-schemaorg-identifier-as-PropertyValue.md

tachyontraveler commented 3 years ago

@ml-evs I'm not familiar with any such existing standards to define the persistent identifiers, but your suggestion of defining it like "identifier": [{"type":"doi","id":"10.1186/1758-2946-3-47"}] sounds good to me. It is also in a similar format as what's mentioned in the second link in @ztrautt 's comment.

@merkys , @CasperWA I understand the concern toward backward compatibility while replacing doi, and agree to keep the field doi for now - at least until the next big release of optimade (v2 ?). If others are also on-board, we can move forward by adding a new list-type field, identifier, as Matt and Zach suggested.