crosscite / content-negotiation

DOI content negotiation
https://data.crosscite.org
MIT License
3 stars 4 forks source link

Cross DOI Content Negotiation: use CSL JSON's author.literal field for consortiums #92

Closed dhimmel closed 4 years ago

dhimmel commented 4 years ago

I'm reporting an issue with the CSL JSON returned by DOI Content Negotiation for Crossref DOIs. This issue was originally reported by a Manubot user at https://github.com/manubot/manubot/issues/158.

The following command requests CSL JSON metadata for https://doi.org/10.1038/ng.3834:

curl --silent --location \
  --header "Accept: application/vnd.citationstyles.csl+json" \
  https://doi.org/10.1038/ng.3834 \
  | python -m json.tool

One weird thing, but not the issue that I'm reporting, is that on the Nature website "GTEx Consortium" is listed as the 10th author, but is second in the Crossref metadata. Assuming this is a publisher error, but wanted to point it out to avoid confusion.

In the JSON returned by the DOI Content Negotiation command, you'll find the following:

    "author": [
        {
            "given": "Colby",
            "family": "Chiang",
            "sequence": "first",
            "affiliation": []
        },
        {
            "name": "GTEx Consortium",
            "sequence": "first",
            "affiliation": []
        },

The mistake is that "name": "GTEx Consortium", should be "literal": "GTEx Consortium",, since literal is the field according to the CSL JSON specification.

Also note that sequence and affiliation are not part of the CSL specification. Crossref DOIs return a large number of fields that are not part of CSL JSON. This is a longstanding but separate issue.

So where is the source code where name need to be replaced with literal? Is this the right repo for reporting issues with CSL JSON returned by DOI Content Negotiation for Crossref DOIs?

mfenner commented 4 years ago

@dhimmel The Crossref content-negotiation in your issue uses a different library, not the library in this repository, so please reach out to them.

You can do content-negotiation with Crossref DOIs in the DataCite content negotiation, but then have to address the service directly:

curl -LH "Accept: application/vnd.citationstyles.csl+json" https://data.crosscite.org/10.1038/ng.3834

This DOI is a bit tricky, and required a fix in our content negotiation service. The XML is at https://api.crossref.org/works/10.1038/ng.3834/transform/application/vnd.crossref.unixsd+xml and you can see how GTEx is added as an author. In our fix we add GTEx add the end, as the author order is not obvious from the XML, also looking at how authors are ordered on the Nature Genetics website.

dhimmel commented 4 years ago

Thanks for the help and updating your content negotiation service. A few more questions to help me fully understand.

The Crossref content-negotiation in your issue uses a different library, not the library in this repository, so please reach out to them.

Got it. Do you happen to know if the appropriate GitHub/GitLab Issues for Crossref content-negotiation feedback?

I'm also a bit confused regarding the relationship between Crosscite, CrossRef, and DataCite. Is this repo the source for https://citation.crosscite.org/docs.html or https://support.datacite.org/docs/datacite-content-resolver?

You can do content-negotiation with Crossref DOIs in the DataCite content negotiation, but then have to address the service directly:

That is great! We'll look into switching Manubot to use this method for all DOIs, because the DataCite CN seems much better at producing valid CSL JSON. Are there any downsides you're aware of compared to using https://doi.org/10.1038/ng.3834? Is service downtime different, etcetera?

The XML is at https://api.crossref.org/works/10.1038/ng.3834/transform/application/vnd.crossref.unixsd+xml and you can see how GTEx is added as an author

Okay so the ordering is incorrect in the Crossref metadata because Nature Genetics deposited the "GTEx Consortium" organization first under contributors, when on their website "GTEx Consortium" is in the middle.

dhimmel commented 4 years ago

Ah this is a cool feature ("link-based content type requests"): https://data.crosscite.org/application/vnd.citationstyles.csl+json/10.1038/ng.3834

Looks like this does not work for https://doi.org/application/vnd.citationstyles.csl+json/10.1038/ng.3834.

dhimmel commented 4 years ago

For https://doi.org/10.1186/s13742-015-0103-4, there still seems to be a problem with converting the consortium name to CSL JSON. The crossref XML contains:

<contributors>
<organization contributor_role="author" sequence="first">The Genome Denmark Consortium</organization>

But then DOI Content negotiation using data.crosscite.org returns the following author list

  "author": [
    {
      "family": "Liu",
      "given": "Siyang"
    },
    {
      "family": "Huang",
      "given": "Shujia"
    },
    {
      "family": "Rao",
      "given": "Junhua"
    },
    {
      "family": "Ye",
      "given": "Weijian"
    },
    {
      "family": "Krogh",
      "given": "Anders"
    },
    {
      "family": "Wang",
      "given": "Jun"
    }
  ],

@mfenner any ideas?


Here's another DOI where the consortium conversion is working properly: https://doi.org/10.1101/664623. The XML contains:

<organization contributor_role="author" sequence="additional">the Genome in a Bottle Consortium</organization>

And the CSL JSON ends up with:

    {
      "literal": "the Genome in a Bottle Consortium"
    }
mfenner commented 4 years ago

@dhimmel link-based content negotiation is a bit of a hack and doesn't work with https://doi.org.