golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.87k stars 17.52k forks source link

proposal: html/template: Better handling of HTML comments #54380

Open kbolino opened 2 years ago

kbolino commented 2 years ago

Background

The html/template package completely strips HTML comments from the rendered output. This is apparently intentional (#14256) but is not documented (#28628). The best summation of the rationale I could find comes from the mailing list:

That would not be safe. html/template needs to know the proper context at each point of the template evaluation, but these types of comments cause their contents to be in a browser-dependent context, so the autoescaping can not be done reliably.

To wit, "these types of comments" seems to refer to old-school conditional comments like <!--[if lt IE 9]>...<![endif]-->. I don't know if these comments even do anything anymore on modern browsers, and a quick Google search wasn't very revealing.

The canonical solution to this problem seems to be to use the template.HTML type to wrap "trusted" strings. For example, there's this StackOverflow answer recommending this approach. However, this means that only way to use comments literally in the template is via function-call indirection, e.g.:

{{ safe "<!-- I just wanted to put a comment in my template -->" }}

This also imposes the limitation that comments cannot have injected values within them.

Rationale for Change

I contend that based upon my own and a number of others' experiences, this behavior is unexpected. First of all, there are uses for comments besides affecting the browser's behavior (e.g. including debugging info useful to the developer). Second, many will have never seen or used IE "conditional comments" in no small part because IE is (almost) dead. Third, if the rationale for this behavior is to address the specific concern of conditional context confounding the auto-escaping mechanism, then I would think that only those comments which raise that issue would be stripped, not all comments. Fourth, of course, is the total lack of documentation of this behavior in the package itself.

There is another proposed "solution" already which is to fall back to using text/template but that is obviously a dangerous regression as then one gives up on all the benefits of html/template.

Though server-side manipulation of HTML is falling by the wayside in favor of script-heavy web apps doing extensive DOM manipulation client-side, the html/template package is not deprecated, and all browsers still (mostly) support "Web 1.0" sites that have minimal or no scripts.

Finally, the package documentation does say the following:

The security model used by this package assumes that template authors are trusted, while Execute's data parameter is not.

Whereas the policy of stripping comments seems to contradict this maxim.

Thus I believe this unintuitive behavior is ripe for change, albeit ideally in a backwards-compatible way.

Possible Paths

  1. Add a new built-in function to the template language: Basically take the SO answer and bless it as canonical, eliminating the need for the user to define a function on their own. This is probably the most conservative approach, especially if the user is allowed to override the built-in function with one of their own (since otherwise there could be a naming conflict from old code), however I also consider this the least beneficial solution, since you still can't safely inject values into comments.
    • An improved approach would be to add a comment function that takes a string argument, escapes it properly for the body of a comment, surrounds it with comment markup, and finally wraps the result in template.HTML, this way you could at least inject values safely but indirectly with printf
  2. Just stop stripping comments altogether. If there's a literal comment in the source, pass it through, injecting properly escaped values if there are any. Put a tombstone on IE while you're at it. This is probably the most aggressive approach and could have thorny repercussions if turned on by default.
  3. Fail to compile any template containing HTML comments. The author probably put the comments there on purpose, and may not even be aware that they're being stripped. The template language has its own comment syntax which can be used for local-only comments. This replaces one unexpected outcome (comments just disappear) with another (some templates that used to compile now don't) but IMO reduces the amount of surprise involved.
  4. Narrow the scope of the comment stripping to only those comments known to be problematic. I don't know if there are other kinds of problematic comments besides the aforementioned conditional comments from IE.
  5. Introduce a new block to the template language inside of which comments are allowed. This improves on idea 1 by allowing safely escaped value injection.

Any of the above suggestions could also be gated, e.g. with template.Option, to improve backwards compatibility.

kbolino commented 2 years ago

Well, I just discovered an interesting wrinkle that could undermine some of the suggestions: there doesn't seem to be a good rule to escape the things you put inside of comments

ianlancetaylor commented 2 years ago

@kbolino I'm not clear: are you retracting the proposal?

kbolino commented 2 years ago

No, at least not without some kind of resolution (even if it is merely to finally document this behavior). I thought a bit on this last night, and putting some of the other ideas along with that unfortunate discovery together, I'd propose a middle-ground solution:

Define a template option htmlcomments which takes one of the following values:

I have no idea how feasible this solution is, but could do some code research into whether it would be possible.

luflow commented 6 months ago

We have interest in resolution of this issue - can we help with resolving this? Currently we stumbled upon this issue because we are using html/template for mail templates where <!--[if IE 8]> html comments are used - unfortunately they are breaking without that fix currently.

So we are interested in the implementation of the proposal and could even help with it if necessary :)

miloWilxite commented 3 months ago

I'm also struggling with this behaviour, similarly to https://github.com/golang/go/issues/54380#issuecomment-1999816747

I'm using html/template with an email program and use of <!--[if mso | IE]><![endif]--> comments is being stripped out of my HTML that I generate, causing broken emails for outlook users :roll_eyes:

Is there much likelihood that this proposal is going to be followed up?

-- EDIT -- Or if anyone has suggestions for a workaround that would be amazing!

kbolino commented 3 months ago

Here's an adaptation of the suggestion from the StackOverflow answer, which seems to work fine in the playground:

const src = `<html><body>
<div>Some <b>HTML</b> content</div>
{{ ifmso }}
<div>Microsoft Outlook Only</div>
{{ endif }}
</body></html>`

func main() {
    t := template.Must(template.New("").Funcs(template.FuncMap{
        "ifmso": func() template.HTML { return template.HTML(`<!--[if mso|IE]>`) },
        "endif": func() template.HTML { return template.HTML(`<![endif]-->`) },
    }).Parse(src))
    t.Execute(os.Stdout, nil)
}

Caveat emptor: The docs for template.HTML specifically say not to use it for unterminated comments, so YMMV. I created an updated version which attempts to inject HTML and instead gets proper escaping, so this hack doesn't seem to break the HTML parser state. You may want to pick better names because the more I read this thing I wrote the more endif looks like a template language keyword (it's not) rather than a custom function. Some upper case might help.

The right solution here of course is to bully Microsoft as much as possible into dropping its antiquated email rendering engine, and/or getting clients off of Microsoft Outlook. But that is beyond the scope of what Go can address.

miloWilxite commented 3 months ago

If Go could get users off of Outlook it would truly be a marvel 😅

The trouble I had is that I'm using MJML to generate my email html templates and it isn't simple (nor very nice for the next dev) to have to go through and replace the comments it applies with the something like you've suggested. But I appreciate that for a lot of situations that is a decent workaround.

Thanks!