elifesciences / elife-tools

Python library for parsing eLife article XML data.
MIT License
15 stars 7 forks source link

Dataset uri from more types of pub-id tags #290

Closed gnott closed 5 years ago

gnott commented 5 years ago

Additional examples of <pub-id> tags having a uri value in their xlink:href attribute added to the XML kitchen sink file in commit https://github.com/elifesciences/XML-mapping/commit/319af014484ece7da2ead9902d73babcb3721b80, their uri values are not extracted by the JATS parser.

Specifically, the example tags to support in the backmatter datasets section include:

<pub-id assigning-authority="NCBI" pub-id-type="accession"
    xlink:href="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE48760">GSE48760</pub-id>

and

<pub-id assigning-authority="other" pub-id-type="archive" 
    xlink:href="https://osf.io/kvu5j/">kvu5j</pub-id>

The suggested improvement at this time is to look for <pub-id> tags with attribute of pub-id-type="accession" or pub-id-type="archive". If a <pub-id> tag is found, and a dataset does not yet have a uri value, then set the uri as that value from the xlink:href attribute.

The uri value is currently only taken from <ext-link> tags. <pub-id> tags are currently only considered where they specify a doi value for the dataset.