Python-Markdown / markdown

A Python implementation of John Gruber’s Markdown with Extension support.
https://python-markdown.github.io/
BSD 3-Clause "New" or "Revised" License
3.74k stars 858 forks source link

Regression in unescaping backslash in attribute #1358

Closed oprypin closed 1 year ago

oprypin commented 1 year ago

Regression in 3.4 - c0f6e5a31ea8e7fe98910a0523144c2a96fa9bf1 is the first bad commit

Reproduction code:

import markdown

s = r'''
<img src="..\..\foo.png">
'''
print(repr(markdown.markdown(s)))

Output before vs after:

'<p><img src="..\\..\\foo.png"></p>'
'<p><img src="..\\\x0246\x03.\\foo.png"></p>'
waylan commented 1 year ago

This is definitely a bug. The final output should not still contain placeholders. Thank you for the report.

However, even if that is resolved, the input still has a problem. Consider the simple input \.. That would correctly output <p>.</p>. Notice that the backslash is removed; which is the correct behavior as the escape character should not be in the final output. However, in a URL ..\ has a specific meaning and the backslash should remain in the output. Therefore, the document author should escape the backslash: ..\\. Although, it appears that that is not currently working correctly either.

I will note that as a workaround, the document author can use forward slashes instead. The only system which uses backslashes is the DOS/Windows local file system, and most browsers will translate from forward slashes to backward slashes properly. Of course, that's not an excuse for the bug, but it does avoid the escaping issue altogether.

...and I just realized this issue is specific to raw HTML. We shouldn't be altering the raw HTML at all.

oprypin commented 1 year ago

Thanks for the quick fix