Python-Markdown / markdown

A Python implementation of John Gruber’s Markdown with Extension support.
https://python-markdown.github.io/
BSD 3-Clause "New" or "Revised" License
3.71k stars 856 forks source link

convert \u2028 to <br>? #1459

Closed vdwees closed 4 months ago

vdwees commented 4 months ago

I'm not an expert in markdown or unicode, but I may have hit an unhandled edge case.

From https://www.unicode.org/versions/Unicode5.2.0/ch05.pdf#G10213:

The Unicode Standard defines two unambiguous separator characters: U+2029 para- graph separator (PS) and U+2028 line separator (LS). In Unicode text, the PS and LS characters should be used wherever the desired function is unambiguous. Otherwise, the following recommendations specify how to cope with an NLF when converting from other character sets to Unicode, when interpreting characters in text, and when converting from Unicode to other character sets.

Control+Enter on a MacOS keyboard creates the LS or \u2028 character as rendered in python. I would expect the \u2028 unicode character to be converted to a <br> tag:

Current:

> markdown('hello\u2028world')
'<p>hello\u2028world</p>'

Expected:

> markdown('hello\u2028world')
'<p>hello<br>world</p>'
facelessuser commented 4 months ago

I am unaware of any Markdown implementation that auto converts such Unicode into <br> tags. If you'd like to use <br> tags, you need to specify them, use two spaces at the end of the line, or use something like https://python-markdown.github.io/extensions/nl2br/ if you'd like to turn normal new lines into <br> tags.

waylan commented 4 months ago

Does HTML treat the \u2028 Unicode character as a break tag? Markdown is a subset of HTML, so I would not expect Markdown to do this by default. However, you are welcome to create your own third-party extension which does whatever you want.