Open ikedas opened 7 years ago
:wave: Thanks for the report. Please note that the github/markup
repository's issues are really just for issues regarding the github-markup
gem itself, which doesn't have anything to do with Markdown processing. You'd be better off contacting our support team with these kinds of issues in future, because we have lots of support staff but only a couple busy engineers who monitor this repo.
For this issue specifically, the root cause is in the CommonMark specification, which we adhere to. The section of the specification on emphasis states:
A left-flanking delimiter run is a delimiter run that is (a) not followed by Unicode whitespace, and (b) either not followed by a punctuation character, or preceded by Unicode whitespace or a punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.
A right-flanking delimiter run is a delimiter run that is (a) not preceded by Unicode whitespace, and (b) either not preceded by a punctuation character, or followed by Unicode whitespace or a punctuation character. For purposes of this definition, the beginning and the end of the line count as Unicode whitespace.
"punctuation character" is defined as "an ASCII punctuation character or anything in the Unicode classes Pc, Pd, Pe, Pf, Pi, Po, or Ps", and 「
and 」
are in the Ps and Pe categories respectively.
The problem here is that this definition of punctuation character makes sense in the context of the specification if we assume "Unicode whitespace" is a part of the text used (as with most Latin alphabet-derived languages); we expect to see The cat is called "Nodoka".
but not 猫は「のどか」という。
, where the latter has no space or punctuation character separating the 「」
from the surrounding text.
Hence, when we add emphasis (e.g. around "Nodoka"
), we get: The cat is called **"Nodoka"**.
but not 猫は**「のどか」**という。
With the English text, the opening **
satisfies the definition of a "left-flanking delimiter run": it is (a) not followed by Unicode whitespace ("
), and (b) preceded by Unicode whitespace. The closing **
satisfies the definition of a "right-flanking delimiter run": it is (a) not preceded by Unicode whitespace ("
), and (b) followed by a punctuation character (.
).
With the Japanese text, however, the opening **
does not satisfy the definition of a "left-flanking delimiter run": it is (a) not followed by Unicode whitespace (「
), but (b) it is followed by a punctuation character, and it is not preceded by Unicode whitespace or a punctuation character (は
). Likewise, the closing **
does not satisfy the definition of a "right-flanking delimiter" run: it is (a) not preceded by Unicode whitespace (」
), but (b) it is preceded by a punctuation character, and it is not followed by Unicode whitespace or punctuation (と
).
In short, this is a deficiency with the CommonMark specification's handling of East Asian text in general, because of the way the specification assumes interaction between punctuation characters and whitespace characters. I'll raise this issue (along with all the above information) in the CommonMark Discussion forum and work toward a solution.
Thanks for your patience and for the report!
Thread opened here: https://talk.commonmark.org/t/emphasis-and-east-asian-text/2491
@kivikakk thanks. I'll comment on the new thread.
It's been over a year and we still haven't had movement here; pinging the upstream repo now.
Problem Description
In East Asian texts in general, word separators (spaces) never be written explicitly. So
should be rendered as
(Image)
and in practice this works as expected.
However, if the text fragment to be decorated ends and/or starts with punctuation:
they won't render as expected:
(Image)
Possible workaround is inserting space before or after punctuations ("␣" means space):
but it will generate ugry text with an extra space before or after punctuations:
(Image)
Suggested modification
East Asian punctuations should be treated in the way same as normal East Asian characters (Chinese ideographs and so on).
FYI: Almost all of East Asian punctuations are listed here: