commonmark / commonmark-spec

CommonMark spec, with reference implementations in C and JavaScript
http://commonmark.org
Other
4.89k stars 317 forks source link

Emphasis surrounded by `*` characters #733

Closed xcoulon closed 1 year ago

xcoulon commented 1 year ago

In example 379 of the spec v0.30, a**"foo"** has no strong emphasis, because the opening ** is preceded by an alphanumeric and followed by punctuation, and hence not part of a left-flanking delimiter run, but I wonder which rules prevent this contents from having an emphasis and render as <p>a*<em>&quot;foo&quot;</em>*</p>?

My understanding is that the second * character before "foo" looks like a valid left-flanking delimiter run since it is preceded by a punctuation character (*) and followed by a punctuation character (") and similarly, the first * character after "foo" looks like a valid right-flanking delimiter run since it is preceded by a punctuation character (*) and followed by a punctuation character (").

Also, I'm a bit confused with the definition of a delimiter run:

is either a sequence of one or more * characters that is not preceded or followed by a non-backslash-escaped * character, or a sequence of one or more _ characters that is not preceded or followed by a non-backslash-escaped _ character.

because it seems to go against example 441 where **foo* is expected to render as <p>*<em>foo</em></p> and example 442 where *foo** is expected to render as <p><em>foo</em>*</p>

wooorm commented 1 year ago

For the first: entire runs are parsed as a whole. So it’s about what’s surrounding the entire run. Afterwards things are “taken” from the runs to form starts or ends of strong or emphasis. See rule 11 (“A literal * character cannot occur at the beginning...”) and 12.

For the second: the algorithm is super complex. Hard to capture in text. The algorithm in prose in the appendix might help: https://spec.commonmark.org/0.30/#phase-2-inline-structure

xcoulon commented 1 year ago

@wooorm thanks for your quick response!

For the first: entire runs are parsed as a whole. So it’s about what’s surrounding the entire run. Afterwards things are “taken” from the runs to form starts or ends of strong or emphasis. See rule 11 (“A literal * character cannot occur at the beginning...”) and 12.

Yes, I saw the rules 11 and 12 but I thought it applied to the contents (interior) of the emphasis and strong emphasis, since in example 441 **foo* is expected to render as <p>*<em>foo</em></p> and in example 442 *foo** is expected to render as <p><em>foo</em>*</p>

For the second: the algorithm is super complex. Hard to capture in text. The algorithm in prose in the appendix might help: https://spec.commonmark.org/0.30/#phase-2-inline-structure

Ok, I'll give a closer look at the algorithm, then :)