gomarkdown / markdown

markdown parser and HTML renderer for Go
Other
1.4k stars 172 forks source link

How to handle escaping #202

Open vbisbest opened 3 years ago

vbisbest commented 3 years ago

I have some HTML embeded in JSON format e.g.

text: "This is a how you make a paragraph

new paragraph

"

I need to escape the HTML encoding because the HTML page actually renders the tags. When I tried to escape my JSON content e.g: text: "This is a how you make a paragraph <p>new paragraph</p>"

The ToHTML encodes the encoding and I get double encoded: &lt;p&gt;new paragraph

Thoughts on how to handle this? Thank you.

chrisesimpson commented 2 years ago

I had a similar issue. I noticed HTML wasn't being escaped so I used html.EscapeString before converting to HTML and it started double escaping the tags. The issue is that the package does it's own escaping but give that some implementations of markdown support a handful of html tags it doesn't escape the tags. So when you escape it yourself you insert ampersands which get escaped by the package. For me this was a problem. I managed to find the code but I didn't want to fork the repo to disable this. Instead I've found a way using the render hook that allows you to escape the HTML code:

    func escapeHTMLHook(w io.Writer, node ast.Node, entering bool) (ast.WalkStatus, bool) {
        switch node.(type) {
        case *ast.HTMLSpan: // allow the html to be escaped
            EscapeHTML(w, node.AsLeaf().Literal)
            return ast.GoToNext, true
        case *ast.HTMLBlock: // allow the html to be escaped
            io.WriteString(w, "\n")
            EscapeHTML(w, node.AsLeaf().Literal)
            io.WriteString(w, "\n")
            return ast.GoToNext, true
        }
        return ast.GoToNext, false
    }

To test this out:

    ....
        extensions := parser.CommonExtensions

        p := parser.NewWithExtensions(extensions)

        opts := html.RendererOptions{
            Flags:          html.CommonFlags,
            RenderNodeHook: escapeHTMLHook,
        }

        r := html.NewRenderer(opts)

        markup := markdown.ToHTML(md, p, r)
    ....

This seems to handle single line HTML ok but for tags that cross multiple lines you lose your line breaks and no markdown between the open a close tags is rendered.

I have been looking into adding a ParserHook to override the leftAngle behaviour so it's completely blind to HTML (apart from the escaping) but I cannot make this work nicely yet.

chrisesimpson commented 2 years ago

I gave up on trying to use the parser and renderer extensions to make this work as everything between the open and close tags was being rendered as text. So if you had something like:

<span>This is a **span** element</span>

It would not make the work "span" bold. I wanted it instead just to render the escaped opening and closing tags separately so that the content, whether block or inline could also be considered by the package.

So instead I added a new parser option to disallow any tags and a new renderer option to escape them.

....
p.Opts.Flags = parser.DisallowHtmlTags

opts := html.RendererOptions{
    Flags: html.EscapeHTMLTags,
....

I'm not sure what the proper protocol for submitting a pull request to this project but I will look into that. Meanwhile, if you want me to share the changes with you, I'm happy to.

OpenWaygate commented 1 year ago

I also run into similar problem but not blame to this package.

This package gives me <p>This is a post</p> which is fine, but this string got escaped in html/template and turn out to be &lt;p&gt;This is a post&lt;p&gt;, so I replace html/template with text/template and nothing escaped.