adsabs / ADSIngestParser

Curation parser library
MIT License
0 stars 7 forks source link

Affiliations not captured from crossref data #96

Open seasidesparrow opened 4 months ago

seasidesparrow commented 4 months ago

Describe the bug For at least some crossref records, the affiliation information for each author is returned in a structure tagged as <affiliations><institution><institution_name>. (See e.g. 10.1364/AO.505607). Currently, the crossref parser is looking for the tag <affiliation> (not affiliations) and extracting the contents with .get_text(). This misses the structure above entirely.

To Reproduce Steps to reproduce the behavior: harvest the crossref xml from their api, and parse with adsingestp.parsers.crossref. Authors 1 and 5 will have ORCIDs, but there will not be any additional affiliation information.

Additional context Add any other context about the problem here.

seasidesparrow commented 4 months ago

The crossref parser is actually parsing crossref xml data that has passed through the Habanero Content Negotiation method, and so it needs to be able to read data in the UNIXREF-XML query return format, documented here: https://www.crossref.org/schema/unixref1.1.xsd.