gomarkdown / markdown

markdown parser and HTML renderer for Go
Other
1.33k stars 171 forks source link

Is there an option to always escape HTML? #308

Closed TACIXAT closed 1 month ago

TACIXAT commented 1 month ago

Let's say I had a markdown document:

## XSS

This is a basic XSS payload: <script>alert(0);</script>.

This is rendered to HTML unescaped as:

<h2>XSS</h2>

<p>This is a basic XSS payload: <script>alert(0);</script>.</p>

I guess this is working as intended because you are supposed to be able to just add HTML to markdown according to this answer.

If you put it in backticks ` it escapes it:

<p>This is a basic XSS payload: <code>&lt;script&gt;alert(0);&lt;/script&gt;</code>.</p>

Is there an option to always escape, even if the HTML is not in backticks? .

kjk commented 1 month ago

There's no built-in option like that.

There's SkipHTML flag on HTML renderer that doesn't output HTML blocks.

One way is to customize renderer and escape ast.HTMLBlock nodes (see https://blog.kowalczyk.info/article/cxn3/advanced-markdown-processing-in-go.html#customizing-html-renderer)

The original code is:

func (r *Renderer) HTMLBlock(w io.Writer, node *ast.HTMLBlock) {
    if r.Opts.Flags&SkipHTML != 0 {
        return
    }
    r.CR(w)
    r.Out(w, node.Literal)
    r.CR(w)
}

You would just do:

func (r *Renderer) HTMLBlock(w io.Writer, node *ast.HTMLBlock) {
    s := EscapeHTML(node.Literal)
    r.CR(w)
    r.Out(w, s)
    r.CR(w)
}

Instead of simple escaping you can sanitize with e.g. https://github.com/microcosm-cc/bluemonday

Sanitizing only removes dangerous HTML and leaves the non-dangerous.

Instead of customizing a renderer, you can also traverse parsed ast before html rendering and replace Literal of ast.HTMLBlock with escaped / sanitized version (https://blog.kowalczyk.info/article/cxn3/advanced-markdown-processing-in-go.html#modify-ast-tree)

TACIXAT commented 1 month ago

Thank you!