Open Gru-gru opened 2 months ago
yes indeed, thank you for reporting this ; I'm adding notes to the bug report, we need to explore a few options to fix this.
This is what arXiv provides on the web:
a few line breaks <br>
This is what arXiv provides on the API: The line length seems to end at 80 characters max, thus introducing unwanted line breaks
And this is why we are not merely replacing line feeds \n
with html line breaks <br>
the result would look like this:
This is what the Datacite API provides:
curl -s https://api.datacite.org/dois/10.48550/arXiv.2311.10204 |jq|grep '"description"' |grep --color '\\\n'
We can try to:
Let's ignore HTML Scraping.
Describe the bug
When a paper is imported from arXiv, Episcience's displaying of the abstract can differ from that of arXiv, because line breaks are lost. Concrete example: https://theoretics.episciences.org/14397 vs https://arxiv.org/abs/2311.10204
Expected behavior
Line breaks should not be ignored, so that the abstract is shown as the authors intended.