commonmark / commonmark-spec

CommonMark spec, with reference implementations in C and JavaScript
http://commonmark.org
Other
4.89k stars 317 forks source link

example 31 is misplaced and unexplained #687

Open rsc opened 3 years ago

rsc commented 3 years ago

In 0.30, examples 31-34 are introduced by:

Entity and numeric character references are recognized in any context besides code spans or code blocks, including URLs, link titles, and fenced code block info strings:

and then examples 35-36 are introduced by:

Entity and numeric character references are treated as literal text in code spans and code blocks:

But example 31 is an example of a context where entity and numeric character references are not recognized, namely raw HTML:

<a href="&ouml;&ouml;.html">

The two intros should probably be rewritten to list raw HTML as one of the exceptions:

Entity and numeric character references are recognized in any context besides code spans, code blocks or raw HTML, including URLs, link titles, and fenced code block info strings:

Entity and numeric character references are treated as literal text in code spans, code blocks, and raw HTML:

and then example 31 should be moved after current example 36.

(The argument can be made that they are "recognized" by the eventual HTML parser reading the output, but they are not recognized by CommonMark, or else the output of example 31 would say <a href="öö.html">. Unless CommonMark is saying that ö should be reescaped to &ouml; in output, but that isn't done in examples 32-34.)

jgm commented 3 years ago

It's tricky to know what to say here so as not to be confusing. If we say that they aren't recognized in raw HTML, people might think that means that &ouml; in raw HTML will be expanded as &amp;ouml; in HTML rendering -- as happens with &ouml; in code spans. If we say they are recognized, that is also a bit misleading, since really they're just passed through.

rsc commented 3 years ago

Indeed. One option would be to reverse the order the two statements and insert a third between them:

Entity and numeric character references are treated as literal text in code spans and code blocks:

(NEW) Entity and numeric character references are passed through unaltered in raw HTML:

Entity and numeric character references are recognized in any other context, including URLs, link titles, and fenced code block info strings:

rsc commented 3 years ago

And, assuming example 31 were in the new middle section, another useful example would be something using an HTML entity that commonmark does not allow, such as &copy, which is passed through rather than turned into &amp;copy.

rsc commented 3 years ago

I created PR #690 in case it is helpful. No worries if you'd rather do something different.