commonmark / commonmark-spec

CommonMark spec, with reference implementations in C and JavaScript
http://commonmark.org
Other
4.89k stars 317 forks source link

Add symbols to unicode punctuation #739

Closed wooorm closed 1 year ago

wooorm commented 1 year ago

The current commonmark definition of “ASCII punctuation character” does not only include the unicode punctuation characters within the ASCII range, but also the unicode symbol characters within the ASCII range: $, +, <, =, >, ^, `, |, ~.

I think this makes sense. I think that users would not know the difference, and would indeed expect to be able to escape * the same as $.

I think it is also sensical to broaden the definition of unicode punctuation the same way, so that there’s no difference between $, £, and .

Currently:

*$*a.

*£*a.

*€*a.

$a.

£a.

a.

(EDIT: if GH supports this PR, the emphasis does not work in the above 3 cases, when it doesn’t, emphasis works for £ and but not for $)

Proposed: none turn into <em>s (note this is not a useful example, I am sure it could be possible to come up with a better one).

Note, S (symbol) includes the subgroups:

gc ; Sc                               ; Currency_Symbol
gc ; Sk                               ; Modifier_Symbol
gc ; Sm                               ; Math_Symbol
gc ; So                               ; Other_Symbol

Note, all unicode characters, all unicode categories

jgm commented 1 year ago

I'm in favor of this change -- does anyone see a problem with it?

rlidwka commented 1 year ago

This needs an example in the spec, otherwise parsers that use spec as compliance tests might miss the change.

wooorm commented 1 year ago

added a test case!