Python-Markdown / markdown

A Python implementation of John Gruber’s Markdown with Extension support.
https://python-markdown.github.io/
BSD 3-Clause "New" or "Revised" License
3.78k stars 862 forks source link

HTML block handling nested in indented blocks doesn't work properly #1096

Open facelessuser opened 3 years ago

facelessuser commented 3 years ago

The following example shows that raw HTML that has empty newlines in the content are not handled properly, and are instead treated as incomplete HTML fragments.

import markdown

print(f"Markdown: {markdown.__version__}")

print("\n------ Results ------\n")

content = r'''
!!! note "Admonition"
    <div>
    Some text

    Some more text
    </div>
'''

print(markdown.markdown(content, extensions=['markdown.extensions.admonition']))

Output

Markdown: 3.3.3

------ Results ------

<div class="admonition note">
<p class="admonition-title">Admonition</p>
<p><div>
Some text</p>
<p>Some more text
</div></p>
</div>
facelessuser commented 3 years ago

Updated results using the latest released Markdown.

waylan commented 3 years ago

So, I based the current behavior on the rules, which state:

The only restrictions are that ... the start and end tags of the block should not be indented with tabs or spaces.

Of course, the reference implementation does not support admonitions, but nested lists are indented. And according to Babelmark, the reference implementation parses indented raw HTML blocks as raw HTML blocks. ☹️ Not what I was expecting. I really thought the reference implementation matched our behavior here. In fact I even added tests for our behavior. 😠

In any event, so long as we are parsing raw HTML in a preprocessor, this is what we will get. The parser is very strict about requiring no indentation (even a single space is not allowed). We would need to switch to a blockprocessor, which would strip the indentation in the appropriate cases (when nested) before parsing as raw HTML. In the early commits to the original PR I was using a block processor but reverted to a preprocessor as I was encountering to many obstacles with the way the block parser splits the source on blank lines.

facelessuser commented 3 years ago

Ugh, I see this is broken in lists as well. For some reason, I was assuming this to be an Admonition specific issue 😦. And I guess, even before the rewrite this was the behavior of Python Markdown. I guess I just never stumbled on this until now.