golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.3k stars 17.58k forks source link

html/template: reserved character "()" in href attribute is autoescaped. #63586

Open gyuber opened 11 months ago

gyuber commented 11 months ago

I think reservedcharacters percent encoding is not performed according to the RFC 3986 spec.

What version of Go are you using (go version)?

1.21

What did you do?

dict := make(map[string]interface{})
dict["link"] = `https://example.com/()"`
tag := `<a href="{{ $.link }}"></a>`
t, _ := template.New("tag").Parse(tag)

var tpl bytes.Buffer
e := t.Execute(&tpl, dict)
if e != nil {
    fmt.Println(e)
}

fmt.Println(tpl.String())

What did you expect to see?

<a href="https://example.com/()"></a>

What did you see instead?

<a href="https://example.com/%28%29%22"></a>
cagedmantis commented 11 months ago

cc @golang/security

rolandshoemaker commented 11 months ago

From html/template/url.go:

// Single quote and parens are sub-delims in RFC 3986, but we
// escape them so the output can be embedded in single
// quoted attributes and unquoted CSS url(...) constructs.
gospider007 commented 10 months ago

Format (webp) is incorrectly encoded to format%28webp%29, and the browser will not encode Format (webp)

package main

import (
    "fmt"
    "net/url"
)

func main() {
    href := "https://img.asuracomics.com/unsafe/fit-in/330x450/filters:format(webp)/https://asuratoon.com/wp-content/uploads/2023/12/¸A°×AC_½AA¼AY´A_AµAcAuc_A¸AIAE²_AOA¾.jpg"
    CorrectURL := "https://img.asuracomics.com/unsafe/fit-in/330x450/filters:format(webp)/https://asuratoon.com/wp-content/uploads/2023/12/%C2%B8A%C2%B0%C3%97AC_%C2%BDAA%C2%BCAY%C2%B4A_A%C2%B5AcAuc_A%C2%B8AIAE%C2%B2_AOA%C2%BE.jpg"
    url, _ := url.Parse(href)
    fmt.Println("Correct URL: ", CorrectURL)
    fmt.Println("net/url URL: ", url.String())
}
Correct URL:  https://img.asuracomics.com/unsafe/fit-in/330x450/filters:format(webp)/https://asuratoon.com/wp-content/uploads/2023/12/%C2%B8A%C2%B0%C3%97AC_%C2%BDAA%C2%BCAY%C2%B4A_A%C2%B5AcAuc_A%C2%B8AIAE%C2%B2_AOA%C2%BE.jpg
net/url URL:  https://img.asuracomics.com/unsafe/fit-in/330x450/filters:format%28webp%29/https://asuratoon.com/wp-content/uploads/2023/12/%C2%B8A%C2%B0%C3%97AC_%C2%BDAA%C2%BCAY%C2%B4A_A%C2%B5AcAuc_A%C2%B8AIAE%C2%B2_AOA%C2%BE.jpg
JonasUnderscore commented 9 months ago

I believe the server is rightfully using ')' as a delimiter, and the escaped URI is for another (non-existing) resource.

https://www.rfc-editor.org/%72%66%63/%72%66%63%33%39%38%36#%73%65%63%74%69%6F%6E%2D%32.%32

2.2

Thus, characters in the reserved set are protected from normalization and are therefore safe to be used by scheme-specific and producer-specific algorithms for delimiting data subcomponents within a URI.

Changing the encoding of delimiters may not be undone by normalization.

2.3

URIs that differ in the replacement of an unreserved character with its corresponding percent-encoded US-ASCII octet are equivalent: they identify the same resource.

Only unreserved mentioned.

3.3

Aside from dot-segments in hierarchical paths, a path segment is considered opaque by the generic syntax. URI producing applications often use the reserved characters allowed in a segment to delimit scheme-specific or dereference-handler-specific subcomponents

'(' can be a delimiter (path, sub-delims, reserved) or an octet/character. '%28' can be a percent encoded octet without special meaning.

'(' and '&' are in the same class, percent-encoding '&' would break most queries.

It seems '(' is not the same as '%28', although many implementations do not use '(' as a delimiter.

The go http library escapes '(' into '%28' and unescapes '%3B' into ';' when parsing URLs. Both are sub-delimiters.

edit: The issue seems to be related to (URL).String which calls (URL).EscapedPath . When it needs to escape some characters (e.g. non-ascii), it uses the unescaped path which removed encoding differences.

rsc commented 4 months ago

Browsers and html/template sometimes disagree about the level of escaping required. I do not believe it is wrong to escape (), and it is necessary to escape them in certain contexts.

In the asuracomics URL, using format:%28webp) works fine, but format:%28webp%29 does not. It's interesting that the server can decode %28 but not %29.

Note that Wikipedia has no problem serving https://en.wikipedia.org/wiki/Comma_%28disambiguation%29.

It seems reasonable to me to declare this a server bug and leave html/template alone.

stgarrity commented 2 months ago

I ran into what I believe is the more general case version of this bug today, which is that html/template seems to be escaping characters inside the anchor href attribute, which makes it hard to generate templates that include links. A simple repro is here on go1.22.5 running on the playground