JabRef / jabref

Graphical Java application for managing BibTeX and biblatex (.bib) databases
https://devdocs.jabref.org
MIT License
3.61k stars 2.57k forks source link

Record creation using DOI sometimes results in wrong year #7943

Closed ggieling closed 3 years ago

ggieling commented 3 years ago
JabRef version on

@article Data imported via a DOI reference can produce the wrong publication year

Steps to reproduce the behavior:

  1. click to create a new entry (library is in biblatex mode)
  2. in the select entry type dialog paste a DOI reference (e.g. 10.1111/bph.15016)
  3. check the date field
The official citation is: British Journal of Pharmacology 2021, 178(1), 6-30 We are looking here to the January 2021 issue of this journal The specific document was first published electronically on 7 feb 2020 The date imported via the DOI reference is 2020-03. Because of the year 2020 being reported in this field, the citation is shown as: British Journal of Pharmacology 2020, 178(1), 6-30 if I download the .BIB data from the publishers website, I receive the below result: @article{https://doi.org/10.1111/bph.15016, author = {Sommer, Natascha and Ghofrani, Hossein A. and Pak, Oleg and Bonnet, Sebastien and Provencher, Steve and Sitbon, Olivier and Rosenkranz, Stephan and Hoeper, Marius M. and Kiely, David G.}, title = {Current and future treatments of pulmonary arterial hypertension}, journal = {British Journal of Pharmacology}, volume = {178}, number = {1}, pages = {6-30}, doi = {https://doi.org/10.1111/bph.15016}, url = {https://bpspubs.onlinelibrary.wiley.com/doi/abs/10.1111/bph.15016}, eprint = {https://bpspubs.onlinelibrary.wiley.com/doi/pdf/10.1111/bph.15016}, abstract = {Therapeutic options for pulmonary arterial hypertension (PAH) have increased over the last decades. The advent of pharmacological therapies targeting the prostacyclin, endothelin, and NO pathways has significantly improved outcomes. However, for the vast majority of patients, PAH remains a life-limiting illness with no prospect of cure. PAH is characterised by pulmonary vascular remodelling. Current research focusses on targeting the underlying pathways of aberrant proliferation, migration, and apoptosis. Despite success in preclinical models, using a plethora of novel approaches targeting cellular GPCRs, ion channels, metabolism, epigenetics, growth factor receptors, transcription factors, and inflammation, successful transfer to human disease with positive outcomes in clinical trials is limited. This review provides an overview of novel targets addressed by clinical trials and gives an outlook on novel preclinical perspectives in PAH. LINKED ARTICLES This article is part of a themed issue on Risk factors, comorbidities, and comedications in cardioprotection. To view the other articles in this section visit http://onlinelibrary.wiley.com/doi/10.1111/bph.v178.1/issuetoc}, year = {2021} } The above publication is from Wiley. To check if the issue was publisher specific I did a similar test in ACS Source: doi:10.1021/ml100273k official citation: ACS Med. Chem. Lett. 2011, 2(3), 243–247 Published online: 29 December 2010 Published inissue: 10 March 2011 imported date: 2010-12 Citation in JabRef: ACS Med. Chem. Lett. 2010, 2(3), 243–247 Again, downloading the citation in a .bib file from the publisher produces the correct publication year @article{doi:10.1021/ml100273k, author = {Woo, L. W. Lawrence and Bubert, Christian and Purohit, Atul and Potter, Barry V. L.}, title = {Hybrid Dual Aromatase-Steroid Sulfatase Inhibitors with Exquisite Picomolar Inhibitory Activity}, journal = {ACS Medicinal Chemistry Letters}, volume = {2}, number = {3}, pages = {243-247}, year = {2011}, doi = {10.1021/ml100273k}, note ={PMID: 24900302}, URL = { https://doi.org/10.1021/ml100273k }, eprint = { https://doi.org/10.1021/ml100273k } } The problem can be at publisher level, at CrossRef and in JabRef, but since it happens for multiple publishers I expect the issue to lie in CrossRef supplying wrong data or JabRef requesting/filtering wrong data. To really exclude the publishers for whom I have see this issue I have reported it as well to them. Log File ``` Paste an excerpt of your log file here ```
tobiasdiez commented 3 years ago

This seems to be a problem with the metadata provided by crossref, which JabRef uses under the hood. This particular paper can be found at: https://search.crossref.org/?from_ui=&q=10.1111%2Fbph.15016#. There you find "published Jan 2021", which would be correct. However, if you click "Actions" and then "Cite", all of the export options report March 2020 as the published data. In fact, the JSON output at https://api.crossref.org/v1/works/10.1111/bph.15016 reports

"journal-issue": {
      "issue": "1",
      "published-print": {
        "date-parts": [
          [
            2021,
            1
          ]
        ]
      }
    },

but also

"issued": {
      "date-parts": [
        [
          2020,
          3,
          23
        ]
      ]
    },
"published": {
      "date-parts": [
        [
          2020,
          3,
          23
        ]
      ]
    }

So I would suggest you contact crossref directly: https://www.crossref.org/contact/

ggieling commented 3 years ago

Tobias

Thank you for your response. I contacted CrossRef and from what I read in their answer, the system should be queried via their api and not via actions > cite, and the answer at least suggests that their api allows for querying the separate dates.

See their answer below, I hope it helps in solving the issue.


All DOI metadata is provided to us directly by the publishers. Crossref does not update, edit, or correct publisher-provided metadata.

Our metadata schema allows publishers to supply both an 'online' and a 'print' publication date. Only one or the other is strictly required, but we encourage them to include both if both are applicable.

The metadata that ACS supplied to Crossref for 10.1021/ml100273k includes both dates:

12 29 2010 03 10 2011

Likewise, the metadata that Wiley supplied for 10.1111/bph.15016 also includes both dates:

03 23 2020 01 2021

So, nothing is inaccurate about the metadata records themselves. It sounds like the way that Jabref is querying, ingesting, or processing the metadata from us is preferentially supplying the online date and/or ignoring the print date.

Our API, with the full metadata records of both items, is freely open and available. See api.crossref.org http://api.crossref.org . So, there's nothing preventing them from accessing both print and online dates.

Metadata Search (search.crossref.org http://search.crossref.org ) is not an API. It's only intended for manual, human use. The data in actions > cite comes from the Content Negotiation service https://citation.crosscite.org/docs.html that is a shared project among Crossref, DataCite, and mEDRA.

Content Negotiation is more limited in what metadata it retrieves for formatted citations. It's not intended to be a comprehensive representation of the full metadata record supplied by the publisher, just basic bibliographic citation data, and that will retrieve the online date if both print and online are present. I'm not sure whether that was intentional or inadvertent, but I will pass your feedback along to our developer teams that the print date might be more useful than online.


tobiasdiez commented 3 years ago

Thanks for contacting them and reporting back.

We essentially just call the crossref API and asking for the bibtex representation (indirectly via the doi content negotiation API they refer to in their response). For the article in question, this would be https://api.crossref.org/works/10.1021/ml100273k/transform/application/x-bibtex. As you can see, the year is 2010. So the "print" data is simply ignored by this API endpoint. The crossref JSON API would return the print publication date, but there is no similar API that works for all DOI (i.e. also including DataCite and mEDRA).

In short, we have to wait until they fix it:

that will retrieve the online date if both print and online are present. I'm not sure whether that was intentional or inadvertent, but I will pass your feedback along to our developer teams that the print date might be more useful than online.