erusev / parsedown

Better Markdown Parser in PHP
https://parsedown.org
MIT License
14.79k stars 1.13k forks source link

Unexpected paragraph in unordered list #474

Closed ulab closed 6 years ago

ulab commented 7 years ago

Using the following Markdown:

* abc
* def
* ghi

* jkl
* mno
* pqr

abc

I get paragraph tags in the "ghi" list element that I don't think should be there. Interestingly it does not get added to the 2nd list "pqr" if a paragraph follows.

<ul>
<li>abc</li>
<li>def</li>
<li>
<p>ghi</p>
</li>
<li>jkl</li>
<li>mno</li>
<li>pqr</li>
</ul>
<p>abc</p>
kminchev commented 7 years ago

According to the Demo, the original parser produces almost identical results.

ulab commented 7 years ago

I didn't expect a set of

there, but perhaps a 2nd list? With the original parser there's another set of

for the next list element. But I'm not sure if that's better?

aidantwoods commented 7 years ago

I think this borders more on expected behaviour? The markdown given is conventionally interpreted as a single list, where the third item contains some extra spacing.

The commonmark reference parser produces <p> tags on all list items

<ul>
<li>
<p>abc</p>
</li>
<li>
<p>def</p>
</li>
<li>
<p>ghi</p>
</li>
<li>
<p>jkl</p>
</li>
<li>
<p>mno</p>
</li>
<li>
<p>pqr</p>
</li>
</ul>
<p>abc</p>

Taking a look at the appropriate part of the spec

A list is loose if any of its constituent list items are separated by blank lines, or if any of its constituent list items directly contain two block-level elements with a blank line between them. Otherwise a list is tight. (The difference in HTML output is that paragraphs in a loose list are wrapped in <p> tags, while paragraphs in a tight list are not.)

So it looks like the break between

* ghi

* jkl

should cause the list to become "loose", and so <p> appears in all items.

Interestingly it does not get added to the 2nd list "pqr" if a paragraph follows.

I think this is expected too, the line break following "pqr" acts to "interrupt" the list, and since the interrupt is followed by some text that isn't sufficiently indented to become part of the list, the line break isn't considered to be part of the text for the final item (so no <p>s are added).

ulab commented 7 years ago

Thanks for providing the link to the commonmark specs. Yes, that's a behaviour I'd expect more than just one element being made paragraph.

(It also showed me what I should do to separate my two lists correctly - by using a different marker.)

aidantwoods commented 7 years ago

@ulab #475 addresses these inconsistencies between list items