baskerville / plato

Document reader
Other
1.26k stars 105 forks source link

empty <span /> in between paragraphs is treated like empty paragraph #248

Closed thataboy closed 2 years ago

thataboy commented 2 years ago

For example

<html>
<head>
<title>test</title>
</head>
<body>
<p>
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
</p>
<span id ="aaa" />
<p> Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
</p>
</body>
</html>

The <span id ="aaa" /> causes extra spacing as if it's <p>&nbsp;</p>

I encountered this in a book which passed epubcheck.

0.9.28 correctly ignores the span test.html.txt

thataboy commented 2 years ago

Follow up:

This is one of two edge cases I've stumbled on that cause extra spacing between blocks. The other is when block contains white space at the end, e.g. (the white space here is the \n before </p>)

<p>
...some text here...
</p>

AND on layout the last line of text fits the entire width of the block, then the <p> will have extra spacing below it.

Here is my probably naive fix for the edge cases. Appears to have negligible impact on performance. Not sure if it's worth doing, nevertheless :)