Describe the bug
If a jats abstract contains multiple sections separated by a paragraph (
) tag, jats parser will only capture the first of these.
To Reproduce
Astronomy and Astrophysics abstracts may have multiple paragraph tags for "Context", "Aims", "Methods" and "Results". Try parsing abstracts/data/A+A/A+A670/abstracts/aa42959-21.xml
The abstract returned will be
"abstract": {
"textEnglish": "Context. V838 Monocerotis is a peculiar binary that underwent an immense stellar explosion in 2002, leaving behind an expanding cool supergiant and a hot B3V companion. Five years after the outburst, the B3V companion disappeared from view, and has not returned to its original state."
},
and is missing the "Aims", "Methods", and "Results" sections.
Additional context
Line 551 of parsers.jats is where the abstract is being extracted. it is using a "find('p')" for the paragraph tag, instead of iterating over a "find_all('p')"
Describe the bug If a jats abstract contains multiple sections separated by a paragraph (
) tag, jats parser will only capture the first of these.
To Reproduce Astronomy and Astrophysics abstracts may have multiple paragraph tags for "Context", "Aims", "Methods" and "Results". Try parsing abstracts/data/A+A/A+A670/abstracts/aa42959-21.xml
The abstract returned will be
and is missing the "Aims", "Methods", and "Results" sections.
Additional context Line 551 of parsers.jats is where the abstract is being extracted. it is using a "find('p')" for the paragraph tag, instead of iterating over a "find_all('p')"