Python-Markdown / markdown

A Python implementation of John Gruber’s Markdown with Extension support.
https://python-markdown.github.io/
BSD 3-Clause "New" or "Revised" License
3.71k stars 856 forks source link

Indented `<script>` tags are parsed as Markdown instead of being skipped. #1471

Open nbanyan opened 1 month ago

nbanyan commented 1 month ago

My scenario:

I'm using PyMdown Extensions' snippets to insert a fenced code block containing a bash command. The same snippet has a <script> block to pull data from another file (using the same snippet extension) to ensure the command is always accurate. This works, but breaks if used inside any Markdown block element, such as a list.

Sample test function:

    def testBlockInput(self):
        """ Test whether script block is ignored. """
        script = '''* Testing indented script block
    <script>
        if (1 < 2 && 3 > 1) {
            console.log("Success `conditional`!");
        }
    </script>'''
        parsed_script = '''<ul>
<li>Testing indented script block
    <script>
        if (1 < 2 && 3 > 1) {
            console.log("Success `conditional`!");
        }
    </script></li>
</ul>'''
        self.assertEqual(self.md.convert(script), parsed_script)
facelessuser commented 1 month ago

So what specifically isn't working? This isn't very clear from your post.

One thing to suggest with Python Markdown is when you have a separate block that you put a newline between the blocks.

* Testing indented script block

    <script>
        if (1 < 2 && 3 > 1) {
            console.log("Success `conditional`!");
        }
    </script>

Regardless of whether there are corner cases where no new line between two blocks works, it is generally suggested that new lines are provided between blocks.

nbanyan commented 1 month ago

The newline doesn't fix it either, it just adds more <p> tags inside the list item.

Test functions:

    def testBlockInput(self):
        """ Test whether script block is ignored. """
        script = '''* Testing indented script block
    <script>
        if (1 < 2 && 3 > 1) {
            console.log("Success `conditional`!");
        }
    </script>'''
        parsed_script = '''<ul>
<li>Testing indented script block
    <script>
        if (1 < 2 && 3 > 1) {
            console.log("Success `conditional`!");
        }
    </script></li>
</ul>'''
        self.assertEqual(self.md.convert(script), parsed_script)

    def testBlockInput2(self):
        """ Test whether script block is ignored. """
        script = '''* Testing indented script block

    <script>
        if (1 < 2 && 3 > 1) {
            console.log("Success `conditional`!");
        }
    </script>'''
        parsed_script = '''<ul>
<li>
<p>Testing indented script block</p>
<p><script>
    if (1 < 2 && 3 > 1) {
        console.log("Success `conditional`!");
    }
</script></p>
</li>
</ul>'''
        self.assertEqual(self.md.convert(script), parsed_script)

Output:

======================================================================
FAIL: testBlockInput (test_apis.TestMarkdownBasics)
Test whether script block is ignored.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/nathanielclark/Downloads/markdown-3.6/tests/test_apis.py", line 77, in testBlockInput
    self.assertEqual(self.md.convert(script), parsed_script)
AssertionError: '<ul>[60 chars]f (1 &lt; 2 &amp;&amp; 3 &gt; 1) {\n          [85 chars]/ul>' != '<ul>[60 chars]f (1 < 2 && 3 > 1) {\n            console.log([60 chars]/ul>'
  <ul>
  <li>Testing indented script block
      <script>
-         if (1 &lt; 2 &amp;&amp; 3 &gt; 1) {
+         if (1 < 2 && 3 > 1) {
-             console.log("Success <code>conditional</code>!");
?                                  ^^^^^^           ^^^^^^^
+             console.log("Success `conditional`!");
?                                  ^           ^
          }
      </script></li>
  </ul>

======================================================================
FAIL: testBlockInput2 (test_apis.TestMarkdownBasics)
Test whether script block is ignored.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/nathanielclark/Downloads/markdown-3.6/tests/test_apis.py", line 98, in testBlockInput2
    self.assertEqual(self.md.convert(script), parsed_script)
AssertionError: '<ul>[64 chars]f (1 &lt; 2 &amp;&amp; 3 &gt; 1) {\n        co[79 chars]/ul>' != '<ul>[64 chars]f (1 < 2 && 3 > 1) {\n        console.log("Suc[54 chars]/ul>'
  <ul>
  <li>
  <p>Testing indented script block</p>
  <p><script>
-     if (1 &lt; 2 &amp;&amp; 3 &gt; 1) {
+     if (1 < 2 && 3 > 1) {
-         console.log("Success <code>conditional</code>!");
?                              ^^^^^^           ^^^^^^^
+         console.log("Success `conditional`!");
?                              ^           ^
      }
  </script></p>
  </li>
  </ul>

----------------------------------------------------------------------

My snippet file: (the base_mkdocs_init.sls:packages inserts a list from the salt file without the closing brackets)

``` {.bash .copy id="pip_install_list" title=""}
pip install Markdown markdown-include mkdocs mkdocs-exclude mkdocs-material mkdocs-material-extensions mkdocs-mermaid2-plugin mkdocstrings mkdocstrings-python Pygments pymdown-extensions PyYAML

My markdown to insert this snippet:

``` markdown
4. Run the following command from your terminal to install all required modules.

    --8<-- "mkdocs_pip_install.md"
nbanyan commented 1 month ago

Also, I do use md_in_html and tried using <script markdown='off'>, but that doesn't work either.

waylan commented 1 month ago

You are using an indented code block (not fenced as you claim) nested in a list item. That means you need 2 levels of indent: 1 for the nesting and a second for the code block. However, you only have one level of indent (4 spaces). 2 levels would require 8 spaces of indent.

* Testing indented script block

        <script>
            if (1 < 2 && 3 > 1) {
                console.log("Success `conditional`!");
            }
        </script>
nbanyan commented 1 month ago

The code block is only for displaying the pip install command. The script element is supposed to be executed to replace the contents of the code block with an updated command, but the JavaScript is breaking because the > comparator is changed to &gt;.

facelessuser commented 1 month ago

Yes, HTML blocks aren't handled properly when not at root level. They are recognized, but not always treated the same as they are at document root level. This is unfortunately just the way Python Markdown works currently. Inline HTML is handled fine while nested under other constructs, but block elements will often have their content parsed as Markdown in these circumstances.

waylan commented 1 month ago

Ah, so you want your <script> tag to be treated as block-level raw HTML. Note that the Markdown rules state:

The only restrictions are that block-level HTML elements — e.g. <div>, <table>, <pre>, <p>, etc. — must be separated from surrounding content by blank lines, and the start and end tags of the block should not be indented with tabs or spaces. Markdown is smart enough not to add extra (unwanted) <p> tags around HTML block-level tags.

Pay particular attention to the phrase "the start and end tags of the block should not be indented with tabs or spaces." This effectively means that block-level HTML cannot be nested because they must begin with the first character of a line. In other words, to follow this rule, the parser intentionally does not allow your desired behavior. So @facelessuser you are incorrect when you state that "HTML blocks aren't handled properly when not at root level." This is the correct behavior, which admittedly is counterintuitive. But we didn't write the rules, we just follow them.

facelessuser commented 1 month ago

So @facelessuser you are incorrect when you state that "HTML blocks aren't handled properly when not at root level.

Fair enough. I tried to express that there is a restriction, but I didn't really stress that it is a rule-based restriction.

nbanyan commented 1 month ago

Ok. Unfortunately PyMdown Extensions snippet maintains the indentation and doesn't have an option to strip the indentation (selectively or otherwise), so I'll need to change the JavaScript to survive being parsed by Markdown.