Open csgrant00 opened 1 year ago
The bulleted points at the end of the article are being demarked with •
, but without a preceeding <p>
or <br>
. The Elsevier XML has these elements embedded in a <ce: list>
/<ce: list-item>
tree, so there are no explicit line breaks; the list elements specify paragraph breaks for each bullet.
Are paragraph tags allowed in abstracts, @csgrant00 ?
I think so, at least I think they should be. I'll try to check...
This can be addressed by updating this line of code to link the two pieces of text with a "\n"
rather than an empty space: https://github.com/adsabs/ADSIngestParser/blob/33ae877f0dc86162182f363f218927f620bdf75b/adsingestp/parsers/elsevier.py#L161
Will also require either removing the call to self._clean_output at the end of this block, or an upstream change to base parser here: https://github.com/adsabs/ADSIngestParser/blob/33ae877f0dc86162182f363f218927f620bdf75b/adsingestp/parsers/base.py#L43
/proj/ads/abstracts/data/ELS/CONSYN.AST/2023/ELS.080423/2214-5524/S2214552423X00030/S2214552423000275/S2214552423000275.xml