Open rsc opened 3 years ago
It's tricky to know what to say here so as not to be confusing.
If we say that they aren't recognized in raw HTML, people might think that means that ö
in raw HTML will be expanded as ö
in HTML rendering -- as happens with ö
in code spans.
If we say they are recognized, that is also a bit misleading, since really they're just passed through.
Indeed. One option would be to reverse the order the two statements and insert a third between them:
Entity and numeric character references are treated as literal text in code spans and code blocks:
(NEW) Entity and numeric character references are passed through unaltered in raw HTML:
Entity and numeric character references are recognized in any other context, including URLs, link titles, and fenced code block info strings:
And, assuming example 31 were in the new middle section, another useful example would be something using an HTML entity that commonmark does not allow, such as ©
, which is passed through rather than turned into &copy
.
I created PR #690 in case it is helpful. No worries if you'd rather do something different.
In 0.30, examples 31-34 are introduced by:
and then examples 35-36 are introduced by:
But example 31 is an example of a context where entity and numeric character references are not recognized, namely raw HTML:
The two intros should probably be rewritten to list raw HTML as one of the exceptions:
and then example 31 should be moved after current example 36.
(The argument can be made that they are "recognized" by the eventual HTML parser reading the output, but they are not recognized by CommonMark, or else the output of example 31 would say
<a href="öö.html">
. Unless CommonMark is saying thatö
should be reescaped toö
in output, but that isn't done in examples 32-34.)