adsabs / ADSIngestParser

Curation parser library
MIT License
0 stars 7 forks source link

JATS parser needs to capture native authors where available #121

Closed seasidesparrow closed 2 months ago

seasidesparrow commented 2 months ago

Describe the bug Publishers are increasingly providing author names in both western and native languages, and we need to capture both for search purposes.

To Reproduce See the file /proj/ads/fulltext/sources/downloads/cache/APS_HARVEST/harvest.aps.org/v2/journals/articles/10.1103/PhysRevB/109/214421/fulltext.xml. Jats parsing is currently only capturing Yang Jiahao and Wu Jianda, but native representations of both names are available: 家豪 and 建达. These should go into Contrib.name.native_lang for each author.

Additional context Related, these will need to be output by adsmanparse.translator in the %n tag.

seasidesparrow commented 2 months ago

This is the structure we need to consider from the example above:

<contrib-group>
<contrib contrib-type="author"><contrib-id contrib-id-type="orcid" authenticated="true">https://orcid.org/0000-0001-7670-2218</contrib-id>
<name><surname>Yang</surname><given-names>Jiahao</given-names></name>
<name-alternatives><string-name name-style="eastern" xml:lang="chi">?~]?家豪<//
string-name></name-alternatives>
<xref ref-type="aff" rid="a1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author"><contrib-id contrib-id-type="orcid" authenticated="true">https://orcid.org/0000-0002-3571-3348</contrib-id>
<name><surname>Wu</surname><given-names>Jianda</given-names></name>
<name-alternatives><string-name name-style="eastern" xml:lang="chi">?~P?建达<//
string-name></name-alternatives>
<xref ref-type="aff" rid="a1 a2 a3"><sup>1,2,3</sup></xref>
<xref ref-type="author-notes" rid="n1"><sup>*</sup></xref>
</contrib>
<aff id="a1"><label><sup>1</sup></label>Tsung-Dao Lee Institute, <institution-wrap><institution-id institution-id-type="ror">https://ror.org/0220qvk04</institution-id><institution>Shanghai Jiao Tong University</institution></institution-wrap>, Shanghai 201210, China</aff>
<aff id="a2"><label><sup>2</sup></label>School of Physics and Astronomy, <institution-wrap><institution-id institution-id-type="ror">https://ror.org/0220qvk04</institution-id><institution>Shanghai Jiao Tong University</institution></institution-wrap>, Shanghai 200240, China</aff>
<aff id="a3"><label><sup>3</sup></label>Shanghai Branch, <institution-wrap><institution-id institution-id-type="ror">https://ror.org/04c4dkn09</institution-id><institution>Hefei National Laboratory</institution></institution-wrap>, Shanghai 201315, China</aff>
</contrib-group>
seasidesparrow commented 2 months ago

Example from A&A:

<contrib contrib-type="author" corresp="yes"> <contrib-id contrib-id-type="orcid">0000-0001-5392-2701</contrib-id> <name-alternatives> <name> <surname>Zhang</surname> <given-names>J.-Y.</given-names> </name> <string-name content-type="native">?| ?~J䶮</string-name> </name-alternatives> <xref rid="AFF1" ref-type=""  
aff">1</xref> <xref rid="AFF2" ref-type="aff">2</xref> </contrib>