Python-Markdown / markdown

A Python implementation of John Gruber’s Markdown with Extension support.
https://python-markdown.github.io/
BSD 3-Clause "New" or "Revised" License
3.71k stars 856 forks source link

bug(md_in_html): “markdown="1"” isn’t removed in child elements of the “li” tag #1427

Closed Kristinita closed 7 months ago

Kristinita commented 7 months ago

1. Summary

If I use HTML tags with “markdown="1"” in child elements of the <li> tag, Python Markdown successfully converts Markdown to HTML but doesn’t remove markdown="1" — the attribute, non-valid in production HTML.

It looks like a bug. I didn’t find in the documentation for the “Markdown in HTML” plugin why my syntax is incorrect.

2. MCVE

2.1. KiraDivInsideList.md

<div class="KiraCustomClass" markdown="1">**This is a Markdown**</div>

1. Kira first list item

    <div class="KiraCustomClass" markdown="1">**This is also Markdown**</div>

1. Kira second list item

Python Markdown div inside li

2.2. Command

python -m markdown -x markdown.extensions.md_in_html KiraDivInsideList.md

2.3. Behavior

2.3.1. Expected
<div class="KiraCustomClass">
<p><strong>This is a Markdown</strong></p>
</div>
<ol>
<li>
<p>Kira first list item</p>
<p><div class="KiraCustomClass"><strong>This is also Markdown</strong></div></p>
</li>
<li>
<p>Kira second list item</p>
</li>
</ol>
2.3.2. Actual

For the first <div> Python Markdown remove markdown="1", but for my <div> inside <li>, markdown="1" still remains.

<div class="KiraCustomClass">
<p><strong>This is a Markdown</strong></p>
</div>
<ol>
<li>
<p>Kira first list item</p>
- <p><div class="KiraCustomClass"><strong>This is also Markdown</strong></div></p>
+ <p><div class="KiraCustomClass" markdown="1"><strong>This is also Markdown</strong></div></p>
</li>
<li>
<p>Kira second list item</p>
</li>
</ol>

3. Validity

<li> can contain any element that is valid in <body>. I haven’t found reasons why I can’t use <div> with a custom class inside <li> as in my example.

4. Environment

  1. Microsoft Windows 11 [Version 10.0.22621.2861]
  2. Python 3.12.0
  3. Python Markdown 3.5.1

Thanks.

facelessuser commented 7 months ago

If I recall, @waylan has specifically stated in the past that md_in_html is mainly meant for root level HTML. That means HTML nested in indented constructs, such as lists are not expected to work.

With that said, I do wish they did work. As an alternative, you can use the pymdownx.blocks.html extension:

import markdown

MD = """
/// html | div.KiraCustomClass
**This is a Markdown**
///

1. Kira first list item

    /// html | div.KiraCustomClass
    **This is also Markdown**
    ///

1. Kira second list item
"""

html = markdown.markdown(
    MD,
    extensions=['pymdownx.blocks.html']
)

print(html)
<div class="KiraCustomClass">
<p><strong>This is a Markdown</strong></p>
</div>
<ol>
<li>
<p>Kira first list item</p>
<div class="KiraCustomClass">
<p><strong>This is also Markdown</strong></p>
</div>
</li>
<li>
<p>Kira second list item</p>
</li>
</ol>
Kristinita commented 7 months ago

Status: RESOLVED :heavy_check_mark:

I get expected behavior for MCVE and real examples when I use pymdownx.blocks.html instead of md_in_html.

Thanks.

Kristinita commented 7 months ago

Type: Documentation changes 📜

has specifically stated in the past that md_in_html is mainly meant for root level HTML. That means HTML nested in indented constructs, such as lists are not expected to work.

If so, I think it would be a nice to add this to the md_in_html documentation to prevent users from trying to use md_in_html not for root level HTML. I also think it would be nice to add to the documentation why md_in_html meant for root level HTML. Users like me may not understand why this was intended.

Thanks.

waylan commented 7 months ago

The reason for this behavior is based on the original Markdown rules, which state in part (emphasis added):

The only restrictions are that block-level HTML elements — e.g. <div>, <table>, <pre>, <p>, etc. — must be separated from surrounding content by blank lines, and the start and end tags of the block should not be indented with tabs or spaces.

The md-in-html extension relies on the standard rules for identifying what is considered a "block-level" element. If you want to diverge from the standard rules, then you need to look at third-party extensions.