adsabs / ADSIngestParser

Curation parser library
MIT License
0 stars 7 forks source link

Crossref parser will fail on article bundles #61

Open seasidesparrow opened 1 year ago

seasidesparrow commented 1 year ago

Describe the bug run.py will log an error and when downloaded article bundles are parsed with the crossref parser

To Reproduce Steps to reproduce the behavior: Try parsing the file /proj/ads/abstracts/data/PHYU/ufn_2022_065_01.xml

Additional context The crossref parser is expecting two things from text content: 1) that each record is wrapped in a tag (crossref.py L381), and 2) that there is only one record per bundle (crossref.py L372). To address the issue, we need a crossref parser that (a) doesn't require there to be only one <doi_record>, and (b) that records aren't necessarily embedded in a <crossref> tag.