golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123k stars 17.54k forks source link

proposal: html/template: escape unquoted attributes by first quoting them #43224

Open empijei opened 3 years ago

empijei commented 3 years ago

Current behavior

We allow actions to appear in "Unquoted Attribute" contexts.

This means this template is valid:

<img src={{.}}>

To support this when we escape we have to take some extra care to escape all whitespace (which would break the token) and we rely on some niche behaviors of the spec.

For example if we render the template above with "da ta/" we get

<img src=da&#32;ta/>

Which looks an awful lot like an ambiguous syntax. (Playground link)

It turns out that if you follow the spec this should parse as

<img src="data/">

But I find this behavior very brittle, especially considering that the escaping character set that we use currently was retrieved by using a bit of the spec and a custom script we never re-run: https://github.com/golang/go/blob/75e16f5127eb6affb4b473c93565a8d29a802e51/src/html/template/html.go#L81-L93

Proposal

Change this escaping to run two passes:

This would remove the need for htmlNospace(Norm)ReplacementTable and would make the logic more robust and simple.

Notes

I found this while looking into https://github.com/golang/go/issues/42506

/cc @katiehockman @FiloSottile @rolandshoemaker @rsc

ianlancetaylor commented 3 years ago

There doesn't seem to be an API change here, so I don't think this has to go through the proposal process. Unless I'm missing something.

antichris commented 3 years ago

How about detecting first, whether quoting and/or escaping is needed at all, and foregoing it otherwise?

One of the first pieces of Go code I ever wrote was for this exact purpose:

  1. Replace all ampersands with &amp; entities.

  2. Determine, whether the attribute value needs to be quoted, i.e. whether it contains quotable (any of " ' ` = < or >) and/or whitespace characters.

    If not, we're done here, return the string as it is (after having escaped &amp;s).

  3. Pick the quote (single or double) that is less prevalent in data, escape that.

    This avoids turning foo with value "b""a""r" into foo="&#34;b&#34;&#34;a&#34;&#34;r&#34;" when it can be foo='"b""a""r"'.

  4. Wrap the escaped value in the opposite quote and return.

It does take at most two passes over the string (one to detect quotables and to pick a quote, other to emit escaped quote), but IMO the benefit balances the cost.

empijei commented 3 years ago

@ianlancetaylor

There doesn't seem to be an API change here, so I don't think this has to go through the proposal process. Unless I'm missing something.

This will definitely break some tests so I am proposing this change to weigh with you all if this looks like it is worth the cost. If you think it would make it clearer that this is not an API change I can rename this to be a normal feature request rather than a proposal.

@antichris Thanks for the suggestion. One of the reasons I am proposing this is to simplify the logic in this package and make it more consistent. While your proposal would indeed make escaped strings shorter it would definitely make the logic more complex and the rendered HTML less consistent. Unless you can find a common use case that foresee a big amount of quotes in an HTML attribute value I'd rather keep it simpler and always use the same quotes. Does this make sense to you?

ianlancetaylor commented 3 years ago

@empijei OK, thanks.

FiloSottile commented 3 years ago

I support this. html/template is meant to be secure and easy to use, not to allow ultimate control over how the generated HTML looks like (as opposed to how it behaves). If we can generate output that is equivalent but is safer and simpler, sounds great.

antichris commented 3 years ago

@empijei

Unless you can find a common use case that foresee a big amount of quotes in an HTML attribute value, I'd rather keep it simpler and always use the same quotes.

I got two right off the top of my head: data:image/svg+xml URIs and JSON data. For an example of the latter, you can view the source of this very page and look for the optimizely-datafile meta tag: nearly 40% of its content are &quot; entities. I'd say that's significant. The SVG URIs also suffer under current implementation escaping < and > (which don't mean anything special when quoted, therefore require no escaping).

And then there's also inline style and scripts (on<event> and such), of course, but people are supposed to be using single quotes in CSS and JS. Also, people are supposed to avoid using inline stuff in general, but it's not like many care to.

Dynom commented 3 years ago

[..] And then there's also inline style and scripts (on<event> and such), of course, but people are supposed to be using single quotes in CSS and JS. Also, people are supposed to avoid using inline stuff in general, but it's not like many care to.

You're, of course, right. However inlining resources is unavoidable for certain common cases, e.g.: e-mails.

Does it make sense to introduce a new function to the API to be more opinionated in these specific cases? Providing more context by exposing a slightly richer API might: keep the logic simpler, allow to stay backwards compatible yet provide to the proposal.

The reason I'm pitching in is because of https://github.com/golang/go/issues/42506 which might be solved by hitchhiking on this issue

FiloSottile commented 3 years ago

To make sure I understand, is the concern purely a matter of readability and size of the generated HTML, or are there cases where this would cause semantic changes or breakage?

timdadd commented 3 months ago

I often have problems with go escaping stuff I don't need escaping and would like to know if a solution exists.

If I pass an svg file to javascript I have to use the JS unescape which I get a warning is now deprecated. If I copy the SVG straight into the JS then it works perfectly. https://go.dev/play/p/s17gBsHNMFi as an example.

If I'm doing something stupid then please let me know