Open seasidesparrow opened 4 months ago
The crossref parser is actually parsing crossref xml data that has passed through the Habanero Content Negotiation method, and so it needs to be able to read data in the UNIXREF-XML query return format, documented here: https://www.crossref.org/schema/unixref1.1.xsd.
Describe the bug For at least some crossref records, the affiliation information for each author is returned in a structure tagged as
<affiliations><institution><institution_name>
. (See e.g. 10.1364/AO.505607). Currently, the crossref parser is looking for the tag<affiliation>
(notaffiliations
) and extracting the contents with .get_text(). This misses the structure above entirely.To Reproduce Steps to reproduce the behavior: harvest the crossref xml from their api, and parse with adsingestp.parsers.crossref. Authors 1 and 5 will have ORCIDs, but there will not be any additional affiliation information.
Additional context Add any other context about the problem here.