Open gyuber opened 11 months ago
cc @golang/security
From html/template/url.go:
// Single quote and parens are sub-delims in RFC 3986, but we
// escape them so the output can be embedded in single
// quoted attributes and unquoted CSS url(...) constructs.
Format (webp) is incorrectly encoded to format%28webp%29, and the browser will not encode Format (webp)
package main
import (
"fmt"
"net/url"
)
func main() {
href := "https://img.asuracomics.com/unsafe/fit-in/330x450/filters:format(webp)/https://asuratoon.com/wp-content/uploads/2023/12/¸A°×AC_½AA¼AY´A_AµAcAuc_A¸AIAE²_AOA¾.jpg"
CorrectURL := "https://img.asuracomics.com/unsafe/fit-in/330x450/filters:format(webp)/https://asuratoon.com/wp-content/uploads/2023/12/%C2%B8A%C2%B0%C3%97AC_%C2%BDAA%C2%BCAY%C2%B4A_A%C2%B5AcAuc_A%C2%B8AIAE%C2%B2_AOA%C2%BE.jpg"
url, _ := url.Parse(href)
fmt.Println("Correct URL: ", CorrectURL)
fmt.Println("net/url URL: ", url.String())
}
Correct URL: https://img.asuracomics.com/unsafe/fit-in/330x450/filters:format(webp)/https://asuratoon.com/wp-content/uploads/2023/12/%C2%B8A%C2%B0%C3%97AC_%C2%BDAA%C2%BCAY%C2%B4A_A%C2%B5AcAuc_A%C2%B8AIAE%C2%B2_AOA%C2%BE.jpg
net/url URL: https://img.asuracomics.com/unsafe/fit-in/330x450/filters:format%28webp%29/https://asuratoon.com/wp-content/uploads/2023/12/%C2%B8A%C2%B0%C3%97AC_%C2%BDAA%C2%BCAY%C2%B4A_A%C2%B5AcAuc_A%C2%B8AIAE%C2%B2_AOA%C2%BE.jpg
I believe the server is rightfully using ')' as a delimiter, and the escaped URI is for another (non-existing) resource.
https://www.rfc-editor.org/%72%66%63/%72%66%63%33%39%38%36#%73%65%63%74%69%6F%6E%2D%32.%32
2.2
Thus, characters in the reserved set are protected from normalization and are therefore safe to be used by scheme-specific and producer-specific algorithms for delimiting data subcomponents within a URI.
Changing the encoding of delimiters may not be undone by normalization.
2.3
URIs that differ in the replacement of an unreserved character with its corresponding percent-encoded US-ASCII octet are equivalent: they identify the same resource.
Only unreserved mentioned.
3.3
Aside from dot-segments in hierarchical paths, a path segment is considered opaque by the generic syntax. URI producing applications often use the reserved characters allowed in a segment to delimit scheme-specific or dereference-handler-specific subcomponents
'(' can be a delimiter (path, sub-delims, reserved) or an octet/character. '%28' can be a percent encoded octet without special meaning.
'(' and '&' are in the same class, percent-encoding '&' would break most queries.
It seems '(' is not the same as '%28', although many implementations do not use '(' as a delimiter.
The go http library escapes '(' into '%28' and unescapes '%3B' into ';' when parsing URLs. Both are sub-delimiters.
edit: The issue seems to be related to (URL).String which calls (URL).EscapedPath . When it needs to escape some characters (e.g. non-ascii), it uses the unescaped path which removed encoding differences.
Browsers and html/template sometimes disagree about the level of escaping required. I do not believe it is wrong to escape (), and it is necessary to escape them in certain contexts.
In the asuracomics URL, using format:%28webp) works fine, but format:%28webp%29 does not. It's interesting that the server can decode %28 but not %29.
Note that Wikipedia has no problem serving https://en.wikipedia.org/wiki/Comma_%28disambiguation%29.
It seems reasonable to me to declare this a server bug and leave html/template alone.
I ran into what I believe is the more general case version of this bug today, which is that html/template
seems to be escaping characters inside the anchor href attribute, which makes it hard to generate templates that include links. A simple repro is here on go1.22.5 running on the playground
I think reservedcharacters percent encoding is not performed according to the RFC 3986 spec.
What version of Go are you using (
go version
)?1.21
What did you do?
What did you expect to see?
What did you see instead?