badboy / mdbook-toc

A preprocessor for mdbook to add inline Table of Contents support.
Mozilla Public License 2.0
163 stars 20 forks source link

`mdbook-toc` partly breaks CommonMark backslash escapes #21

Closed twelho closed 2 years ago

twelho commented 3 years ago

Issue originally detailed in https://github.com/rust-lang/mdBook/issues/1620, but I've now figured out that it is this preprocessor causing the issue. CommonMark and mdBook support backslash escapes, here's example 14 from the CommonMark specification (a bit reordered and double spaces appended to each line to force newlines):

\*not emphasized*  
\<br/> not a tag  
\[not a link](/foo)  
\`not code`  
\* not a list  
\# not a heading  
\[foo]: /url "not a reference"  
\&ouml; not a character entity  
1\. not a list  

Rendering this using mdBook without mdbook-toc enabled gives this output, which looks as intended:

image

But as soon as mdbook-toc version 0.7.0 is enabled using

[preprocessor.toc]
command = "mdbook-toc"
#marker = "[TOC]" # doesn't matter if set or not
#renderer = ["html"] # doesn't matter if set or not

and no matter if the mdBook page contains a table of contents or not the last two escapes stop working:

image

I presume this is due to some inadvertent "unescaping" (backslash removal) happening in the preprocessor, but it's a bit strange that only numeric lists and character entities are affected. Specifically for numeric lists the current behavior is identical to the backslashes being removed by hand, see https://github.com/rust-lang/mdBook/issues/1620 for another example.

badboy commented 3 years ago

ah gosh, round-trip parsing markdown is ... hard and this is a whackamole. I need to see what I can do.