adsabs / ADSIngestParser

Curation parser library
MIT License
0 stars 7 forks source link

Removes any empty affils after normalization #130

Closed seasidesparrow closed 3 months ago

seasidesparrow commented 3 months ago

The internal method JATSAffils._fix_affil(string) takes an input affil string, and does some regular expression/replace calls to eliminate stray stray content like "Institution, My Town, United States, ,". However, there are legitimate cases where a string passed to this method is nothing but stray content, like ", ,". This can happen if email addresses are given as ext-links appended to a particular affiliation string -- the tag itself may contain an email address and its' xref, but those tags may be separated in text with commas. This bugfix simply checks the resulting string before appending it to the output affiliation string; if it is a blank or Null string, it will not be appended.

modified:   adsingestp/parsers/jats.py
new file:   tests/stubdata/input/jats_iop_blank_affil_removed.xml
new file:   tests/stubdata/input/jats_iop_blank_affil_removed2.xml
new file:   tests/stubdata/output/jats_iop_blank_affil_removed.json
new file:   tests/stubdata/output/jats_iop_blank_affil_removed2.json
modified:   tests/test_jats.py