Closed gnott closed 7 years ago
Example is https://github.com/elifesciences/elife-crossref-feed/issues/88 If there is a comment, then convert any citation to unstructured_citation?
I think the main criteria is that if there is a DOI (unless a data citation) retain as a structured reference, irrespective of what you can gleam or have to ignore from the JATS XML.
Then, if there is there are items such as comments and urls, add them in unstructured parts as you have in tests, eg:
xml
<citation key="12">
<volume_title>PyMol</volume_title>
<author>DeLano</author>
<cYear>2002</cYear>
<article_title>The PyMol Molecular Graphics System</article_title>
<unstructured_citation>DeLano W. 2002. The PyMol Molecular Graphics System.
Schrödinger LLC. PyMol. Version 1.7.4.
https://www.pymol.org/.</unstructured_citation>
</citation>
I did not realise you could mix an match structured information with unstructured information like that!
This is the response from Crossref:
If you include both structured data and an unstructured citation, and the structured data is thorough enough to be parsed, then the unstructured citation will be totally ignored. If the structured data is missing some crucial element (e.g. no journal title), then the system will process the unstructured citation instead.
However, our system can only account for the quantity of structured citation data, not its quality. The problematic scenario is one where you've included enough structured citation data so that the structured citation is parsed, but it's poor quality or inaccurate metadata, (e.g. if you've spelled the journal title or author's name incorrectly; or included an incorrect page number, etc.) so the system will not be able to find a DOI match for that citation. We won't go on to try the unstructured citation in that case.
So, what that boils down to is: it's best to send both structured and unstructured citations unless you find that the metadata in your structured citations tends to be inaccurate or poorly formatted in such a way that it's preventing citation matches with the cited articles' DOIs. In that case, sending just the unstructured citations is preferable.
My response to Crossref:
That's really helpful. All our references are crosschecked against the Crossref API for a DOI, so we "should" be picking up any crossref DOIs and supplying them in our metadata.
We also crosscheck PubMed API.
The issue is where DOIs are not registered by Crossref, or the content type is not a journal and the metadata might not be properly checked via your API.
Is your system checking just Crossref DOIs or other providers too?
I think the metadata in our structured citations tends to be pretty good, but of course it is improved a lot by using the PubMed and Crossref AOIs - chicken and egg scenario!
A note for a possible todo in the code is to create a configuration setting for unstructured_citation format.
If it is set to "hybrid" or to True (if we call it hybrid unstructured citation) then in the Crossref deposit it will include both the individual citation tags and the unstructure_citation tag (if applicable).
If the configuration value is set to False, then it will only include the unstructured_citation tag (when applicable) and not the other citation tags.
That will make the output flexible and configurable for other publishers and depending on the best practice for citation formats in the Crossref schema.
So Crossref only check their internal DOI system:
Our system only checks for citation matches among Crossref DOIs. We can match non-journal content (books, conferences, etc.), though journal articles do make up the bulk of our metadata records.
Lets discuss what approach we take on our next call.
M
I think since we will include structured tags and unstructured_citation tags together for each citation, we can probably close this for now. We can change the logic later if we find the approach to be unsatisfactory.
The current logic while adding support for the
<unstructured_citation>
tag is based on eLife citations which explicitly mention thepublication-type
, e.g.Some additional logic that could be added around this could be based on the citation's values themselves. For example, if the citation has a
uri
or acomment
, then include an<unstructured_citation>
tag, since these particular details are not allowed elsewhere in the Crossref schema.@Melissa37 do you have any thoughts you could add with respect to this and non-eLife examples?