Closed magicOz closed 2 years ago
We need to consider what other implementations do in this scenario. Although, after a quick look, the couple I checked (GitHub and MultiMarkdown) seem to have weird output which doesn't really make any sense (the language is {
). My best guess is that the quote is a disallowed character and so it causes the language to not be parsed correctly. The exact output doesn't really matter here, so long as the quote character isn't part of an HTML attribute. That would certainly be the easiest solution, if we were to do anything at all.
I will also note that while our serializer properly escapes HTML attributes, we bypass that with fenced code blocks as we build the HTML outside of the ElementTree and store it as raw HTML which gets swapped with a placeholder after serialization. If we wanted to make use of ElementTrree here, we would need to modify/subclass the upstream library to include support for a "raw" type to hold the content of the code block. That would be a much larger refactor though, and out-of-scope for an immediate fix here. That said, I have toyed with the idea from time to time.
I suppose we could escape each of the attribute values here:
That would match the behavior of anything which goes through ElementTree without changing the parsing of the fenced code blocks. Presumably, Pygments is already doing similar escaping (should probably confirm that).
I suppose we could escape each of the attribute values here:
Yes, that seems to be the most practical solution imo.
I took another look at how other implementations handle this with a simpler input (the curly brackets are not supported by all implementations and were resulting in confusing results).
``` "foo
code
<del>Checking [Babelmark][1] it appears that no implementations address this at all (we can ignore the implementations which don't support fenced code blocks). Of particular interest is CommonMark which provides a detailed [spec][2]. I note that the spec itself makes no mention of any disallowed characters (except for the backtick) within the language name.</del> I also find it interesting that many implementations seem to choke and error on this (although that could be specific to Babelmark, not the implementation itself). I checked PHP Markdown Extra directly ([here][3]) and it simply fails to recognize it as a code block. In other words, the quote character is presumably disallowed.
On the one hand, I am inclined to follow PHP Markdown Extra's lead as the two implementations were originally developed simultaneously as the very first ever functioning fence code code parsers. However, that only addresses this one very specific issue. By escaping the attributes, we also potentially address as-yet undiscovered issues.
[1]: https://johnmacfarlane.net/babelmark2/?normalize=1&text=%60%60%60+.foo%0Acode%0A%60%60%60
[2]: https://spec.commonmark.org/0.30/#info-string
[3]: https://michelf.ca/projects/php-markdown/dingus/
Sorry, in my last comment I just realized I had a typo in the Babelmark input. Lets try again. It seems that CommonMark and Pandoc are escaping the attribute and MultiMarkdown does nothing (passes it through unaltered resulting in a XSS issue). Also of interest is that Python-Markdown actually does match PHP Markdown Extra and fails to recognize it as a code block. Python-Markdown needs the curly braces to ensure it is recognized as a code block in this case. I'm okay with that. So we just need to focus on what to do in the curly bracket case. PHP Markdown Extra also fails to recognize that case as a fenced code block. However, the PHP implementation doesn't support the advanced attribute list features we do (which includes wrapping attribute values in quotes), so I think we can ignore that here.
Given all of the above, I think that the escaping all attributes solution is the right way forward.
The extension fenced code blocks (https://python-markdown.github.io/extensions/fenced_code_blocks/#attributes) breaks the HTML formatting when a language, id or class contains a quotation-mark (").
https://github.com/Python-Markdown/markdown/blob/master/markdown/extensions/fenced_code.py#L122-L127
The following snippet