golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.75k stars 17.64k forks source link

net/url: unexpected url encoding for ! in path #40894

Open mh47838704 opened 4 years ago

mh47838704 commented 4 years ago

What version of Go are you using (go version)?

$ go version

Does this issue reproduce with the latest release?

alway Exist

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

What did you do?

func TestEncodedUrlPathEncode(t *testing.T) {
    urlString := "http://www.xyz.com/!^test.jpg"
    parsedUrl, _ := url.Parse(urlString)
    rawUrl := parsedUrl.EscapedPath()
    fmt.Println("encodedUrl:",rawUrl)
}

func TestEncodedUrlPathEncode(t *testing.T) {
    urlString := "http://www.xyz.com/!test.jpg"
    parsedUrl, _ := url.Parse(urlString)
    rawUrl := parsedUrl.EscapedPath()
    fmt.Println("encodedUrl:",rawUrl)
}

What did you expect to see?

for example

originUrl:https://github.com/!test.jpg
encodedUrl: https://github.com/!test.jpg    --- result is expected

originUrl: https://github.com/!^test.jpg
encodedUrl: https://github.com/%21%5Etest.jpg
expected: https://github.com/!%5eds.jpg

What did you see instead?

func (u *URL) EscapedPath() string {
    if u.RawPath != "" && validEncoded(u.RawPath, encodePath) {
        p, err := unescape(u.RawPath, encodePath)
        if err == nil && p == u.Path {
            return u.RawPath
        }
    }
    if u.Path == "*" {
        return "*" // don't escape (Issue 11202)
    }
    return escape(u.Path, encodePath)
}

i found the code int sdk, "validEncoded" method treat the "!" as encoded, but in "escape" define "!" need encoed when mode is “encodePath”, where is judged in method "shouldEscape", logic conflict, so the result is not expected.

and i found other language do not encode "!" "(" ")" , such as python

davecheney commented 4 years ago

Looking at RFC 3986 ^ must always be percent encoded while ! may avoid percent encoding depending on its position in the URL. Section 3.3 indicates that ! need not be escaped. However any character in a path element may be percent escaped, for example https://google.com/a and https://google.com/%61 are identical.

I agree that the addition of a character that must be escaped into a path element triggering other sub-delims to be percent encoded is surprising, but by my reading of the RFC not wrong.

Can you explain how you found this issue and what problem is it causing for you?

mh47838704 commented 4 years ago

i found this,cause i build a web proxy use go,a url through this proxy,the “!” has been encoded to "21%",but the dest server also a proxy, dest server has a lot of rules to dispatch the request ,one of them is match the char “!”,because the "!" was encoded to "21%" when the request pass through my proxy server, so the dest server can not match correctly

Looking at RFC 3986 ^ must always be percent encoded while ! may avoid percent encoding depending on its position in the URL. Section 3.3 indicates that ! need not be escaped. However any character in a path element may be percent escaped, for example https://google.com/a and https://google.com/%61 are identical.

I agree that the addition of a character that must be escaped into a path element triggering other sub-delims to be percent encoded is surprising, but by my reading of the RFC not wrong.

Can you explain how you found this issue and what problem is it causing for you?

i found this,cause i build a web proxy use go,a url through this proxy,the “!” has been encoded to "21%",but the dest server also a proxy, dest server has a lot of rules to dispatch the request ,one of them is match the char “!”,because the "!" was encoded to "21%" when the request pass through my proxy server, so the dest server can not match correctly

and i also found the encoded logic is not the same, when path only has char "!" and when the path has any other char that need encode, the output can not keep the same, this is not what i want

cagedmantis commented 4 years ago

/cc @rsc @bradfitz