Python-Markdown / markdown

A Python implementation of John Gruber’s Markdown with Extension support.
https://python-markdown.github.io/
BSD 3-Clause "New" or "Revised" License
3.79k stars 863 forks source link

Nested one-liners broken with md_in_html #1074

Open git1sal opened 3 years ago

git1sal commented 3 years ago

Version 3.3.3. I've found several anomalies in the behavior of nested markdown=1 blocks. Here's one that's probably diagnostic:

markdown.markdown('<div class="outer" markdown="block"><div class="inner" markdown="block">*foo*</div></div>', extensions=["extra"])

returns:

'<p><div class="inner" markdown="block"><div class="outer" markdown="block">*foo*</div></p>\n</div>'

The inner <div> is now outside.

Other weird things in this example are the addition of the <p> tags, which are unmatched, and a stray \n .

git1sal commented 3 years ago

Oh, I should have mentioned also that the actual content *foo* does not get markdownized in this test case.

waylan commented 3 years ago

I'm not able to replicate that behavior in any recent version. With versions 3.3.0 through 3.3.3 I get:

<div class="outer">
<div class="inner">
<p><em>foo</em></p>
</div>
</div>

However, I see that in #1069 (which is part of the unreleased version 3.3.4) we broke this and I am getting:

<div class="outer">
<p><div class="inner" markdown="block"><em>foo</em></p>
</div>
</div>

That does not turn things 'inside-out,' but it is clearly wrong. and it demonstrates again that our tests for md_in_html are incomplete.

By the way, there is a clear difference between the inputs:

<div class="outer" markdown="block">
<div class="inner" markdown="block">*foo*</div>
</div>

... which works fine and ...

<div class="outer" markdown="block"><div class="inner" markdown="block">*foo*</div></div>

... which is broken in the current HEAD of master. The later input is also not something we have in our tests.

The issue is that the nested div does not start at the beginning of a line and therefore is not parsed as a block level element. Under normal circumstances, this would be the correct behavior. However, as it is the first and immediate child of the outer <div> is should still be treated as a block level element. Interestingly, when working on #1069 it occurred to me that this could be an issue, but when the simple fix didn't break any tests, I dismissed the thought. I think we need a more sophisticated manner to determine at_line_start. Sigh.

git1sal commented 3 years ago

Thank you very much—inserting newlines resolved the problem for me.

Thank you for the amazing work you are doing on this package!