Python-Markdown / markdown

A Python implementation of John Gruber’s Markdown with Extension support.
https://python-markdown.github.io/
BSD 3-Clause "New" or "Revised" License
3.71k stars 856 forks source link

Tables in blockquotes with nl2br extension #1431

Closed aricept closed 7 months ago

aricept commented 7 months ago

When using the nl2br extension, extraneous <br /> tags are added at the end of each line when nesting a table in a blockquote.

>>> quote_text = """
... ><table>
... >    <tbody>
... >        <tr>
... >            <td>
... >                <p>Some text goes here, and more text.</p>
... >                <p>And a new paragraph</p>
... >            </td>
... >        </tr>
... >    </tbody>
... ></table>
... """
>>> print(quote_text)

><table>
>    <tbody>
>        <tr>
>            <td>
>                <p>Some text goes here, and more text.</p>
>                <p>And a new paragraph</p>
>            </td>
>        </tr>
>    </tbody>
></table>

>>> html = markdown.markdown(quote_text, extensions=['markdown.extensions.nl2br'])
>>> print(html)
<blockquote>
<p><table><br />
   <tbody><br />
       <tr><br />
           <td><br />
               <p>Some text goes here, and more text.</p><br />
               <p>And a new paragraph</p><br />
           </td><br />
       </tr><br />
   </tbody><br />
</table></p>
</blockquote>

I would expect what is received without the nl2br extension, as the extension shouldn't be processing newlines inside the block-level HTML:

<blockquote>
<p><table>
   <tbody>
       <tr>
           <td>
               <p>Some text goes here, and more text.</p>
               <p>And a new paragraph</p>
           </td>
       </tr>
   </tbody>
</table></p>
</blockquote>
waylan commented 7 months ago

This is due to a subtlety of the rules, which state in part:

The only restrictions are that block-level HTML elements — e.g. <div>, <table>, <pre>, <p>, etc. — must be separated from surrounding content by blank lines, and the start and end tags of the block should not be indented with tabs or spaces.

The above suggests that block level HTML cannot be nested inside any other constructs. In fact, this is exactly how the reference implementation works. Therefore, your table is (correctly) not recognized as a block level raw HTML block. And for that reason, Markdown processing is done within the block.

The correct way to approach this is to use raw HTML for the entire block. In other words, use this input:

<blockquote>
<table>
   <tbody>
       <tr>
           <td>
               <p>Some text goes here, and more text.</p>
               <p>And a new paragraph</p>
           </td>
       </tr>
   </tbody>
</table>
</blockquote>

Interestingly, that is actually easier to type as you don't need to add > to the beginning of each line.

aricept commented 7 months ago

Ah, I see, I see. Because it is within a Markdown block, the entire content is treated as Markdown. The use-case here is a message board, where replies are quoted with > before each line (email-style, as the spec likes to say), can nest, and the table formatting was not working as expected. Guess a preprocessor to catch this will have to work. Thanks!