gettalong / kramdown

kramdown is a fast, pure Ruby Markdown superset converter, using a strict syntax definition and supporting several common extensions.
http://kramdown.gettalong.org
Other
1.72k stars 275 forks source link

Not parsing "<" "&" ">" correctly in code block nested inside of HTML elements #760

Closed acq688 closed 2 years ago

acq688 commented 2 years ago

Hi -- I'm working with Github Pages and Jekyll and I've run into an issue with the characters "<", ">", and "&" showing as HTML entities inside of a code block when that code block is nested as follows:

<details>
  <div>
  code with > < & here
```

My particular case uses `<details>`, but the shape seems to be what matters, as I have also reproduced this issue with a `<div>` as the outermost container.

In my `_config.yml`, I have the following settings on:
```yaml
markdown: kramdown
kramdown:
  parse_block_html: true
  parse_span_html: true
  syntax_highlighter_opts:
    block:
      line_numbers: true

In my markdown file I have the following block:

<details>
<summary markdown="0">
<h4>Test Example Code Here</h4>
</summary>

Check out my code block below:

```bash
# We are in a code block
export VARIABLE=xxx
echo "VARIABLE" > test.txt

Less than: >
Greater than: <
Ampersand: &

As expected, this produces the following:
![Screen Shot 2022-07-25 at 1 55 59 PM](https://user-images.githubusercontent.com/967550/180853116-57764354-7929-4fa2-ba42-1d5fb878515f.png)

However, if I try to wrap the information in `<details>` in a `<div>` (for additional styling), I end up with the following:
![Screen Shot 2022-07-25 at 1 59 26 PM](https://user-images.githubusercontent.com/967550/180853703-21d32f9d-6cba-4660-9b9f-ca6db7fc86a1.png)

Setting `markdown="0"` for this div gets me closer, resulting in:
![Screen Shot 2022-07-25 at 2 01 11 PM](https://user-images.githubusercontent.com/967550/180854031-6636a778-fd12-486e-914d-47887b368c82.png)

Code for this example:
````html
<details>
<summary markdown="0">
<h4>Test Example Code Here</h4>
</summary>

<div markdown="0">
Check out my code block below:

```bash
# We are in a code block
export VARIABLE=xxx
echo "VARIABLE" > test.txt

Less than: >
Greater than: <
Ampersand: &



However, you see that "<", ">", and "&" are showing up as HTML entities. I've tried escaping them in various ways, without any luck.  Please let me know if you see something that I'm doing wrong. Thank you!
gettalong commented 2 years ago

If I use the following input file:

<details>
<summary markdown="0">
<h4>Test Example Code Here</h4>
</summary>

<div>
Check out my code block below:

~~~bash
# We are in a code block
export VARIABLE=xxx
echo "VARIABLE" > test.txt

Less than: >
Greater than: <
Ampersand: &


and run it through `kramdown --parse-block-html --parse-span-html` I get the following result:

~~~html
<details>
  <summary>
<h4>Test Example Code Here</h4>
</summary>

  <div>
    <p>Check out my code block below:</p>

    <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># We are in a code block</span>
<span class="nb">export </span><span class="nv">VARIABLE</span><span class="o">=</span>xxx
<span class="nb">echo</span> <span class="s2">"VARIABLE"</span> <span class="o">&gt;</span> test.txt

Less than: <span class="o">&gt;</span>
Greater than: &lt;
Ampersand: &amp;
</code></pre></div>    </div>
  </div>
</details>

which looks right.

Is this what you want? Note that I'm using the standard kramdown parser in this example.

acq688 commented 2 years ago

Hi @gettalong -- thanks for the response!

Hmm, using the CLI, I get the same output that you do, which is the desired output.

However, if I copy this same block of code into my markdown file for Github pages I get:

<details>
  <summary>
    <h4>Test Example Code Here</h4>
  </summary>

  <div>
    <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="rouge-code"><pre>&lt;p&gt;Check out my code block below:&lt;/p&gt;

&lt;div class="language-bash highlighter-rouge"&gt;&lt;div class="highlight"&gt;&lt;pre class="highlight"&gt;&lt;code&gt;&lt;table class="rouge-table"&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td class="rouge-gutter gl"&gt;&lt;pre class="lineno"&gt;1 2 3 4 5 6 7
</pre></td></tr></tbody></table></code></pre></div>    </div>
    <p>&lt;/pre&gt;&lt;/td&gt;&lt;td class="rouge-code"&gt;&lt;pre&gt;<span class="c"># We are in a code block</span>
<span class="nb">export </span><span class="nv">VARIABLE</span><span class="o">=</span>xxx
<span class="nb">echo</span> <span class="s2">“VARIABLE”</span> <span class="o">&gt;</span> test.txt</p>

    <p>Less than: <span class="o">&gt;</span>
Greater than: &lt;
Ampersand: &amp;
&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;    &lt;/div&gt;</p>
  </div>
</details>

Which renders as:

Screen Shot 2022-07-25 at 4 00 27 PM

If I go ahead and add markdown="0" to that div, I get:

<details>
  <summary>
    <h4>Test Example Code Here</h4>
  </summary>

  <div>
    <p>Check out my code block below:</p>

    <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><table class="rouge-table"><tbody><tr><td class="rouge-gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre></td><td class="rouge-code"><pre><span class="c"># We are in a code block</span>
<span class="nb">export </span><span class="nv">VARIABLE</span><span class="o">=</span>xxx
<span class="nb">echo</span> <span class="s2">"VARIABLE"</span> &amp;gt<span class="p">;</span> test.txt

Less than: &amp;gt<span class="p">;</span>
Greater than: &amp;lt<span class="p">;</span>
Ampersand: &amp;amp<span class="p">;</span>
</pre></td></tr></tbody></table></code></pre></div>    </div>
  </div>
</details>

Which renders as:

Screen Shot 2022-07-25 at 4 03 57 PM

Which seems to be correct, beyond the odd treatment of those characters (and the semicolon in the span). 🤔 I'm wondering if maybe this is conflicting with something else in the project...

gettalong commented 2 years ago

Using the kramdown CLI you are by default using the kramdown parser and not the GFM one. So you might wanna narrow it down to whether it is the GFM parser or Jekyll and then report there.

Since kramdown itself is working fine, I will close the issue.

acq688 commented 2 years ago

Thanks for putting me on the right path for this!

In case anyone else stumbles upon this issue, I was able to narrow it down to the base HTML page, where {{ section.content | markdownify }} is being used. Since the <div> I'm using is set to markdown="0", kramdown is providing the following output:

<details>
<summary>
<h4>Test Example Code Here</h4>
</summary>

<div>
Check out my code block below:

~~~bash
# We are in a code block
export VARIABLE=xxx
echo "VARIABLE" &gt; test.txt

Less than: &gt;
Greater than: &lt;
Ampersand: &amp;
~~~
</div>
</details>

And this is getting piped through to markdownify. Markdownify doesn't seem to know what to do with those existing HTML entities, and they end up getting escaped incorrectly. I'm not sure if this is a bug in any particular project or just an unfortunate side effect of the way they're interacting. For now, while it's not the prettiest fix, I can bypass this issue by calling:

{{ section.content | replace: "&amp;", "&" | replace: "&lt;", "<" | replace: "&gt;", ">" | markdownify }}