golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.25k stars 17.7k forks source link

net/url: URL allows malformed query round trip #22907

Open artyom opened 6 years ago

artyom commented 6 years ago

What did you do?

package main

import (
    "fmt"
    "log"
    "net/url"
)

func main() {
    u, err := url.Parse("http://example.com/bad path/?bad query#bad fragment")
    if err != nil {
        log.Fatal(err)
    }
    fmt.Println(u.String())
}

https://play.golang.org/p/hdX1zpv3BN

What did you expect to see?

I expect either url.Parse return a non-nil error or URL.String method return fully escaped url representation — http://example.com/bad%20path/?bad%20query#bad%20fragment — with query being escaped the same way as path or fragment.

What did you see instead?

http://example.com/bad%20path/?bad query#bad%20fragment

For the reference, such url is rejected by net/http.Server: https://play.golang.org/p/2gujmbXZlu

Does this issue reproduce with the latest release (go1.9.2)?

Yes

System details

go version devel +9a13f8e11c Tue Nov 28 06:47:50 2017 +0000 darwin/amd64
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/artyom/Library/Caches/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/tmp/go:/Users/artyom/go"
GORACE=""
GOROOT="/Users/artyom/Repositories/go"
GOTMPDIR=""
GOTOOLDIR="/Users/artyom/Repositories/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/lb/3rk8rqs53czgb4v35w_342xc0000gn/T/go-build624293827=/tmp/go-build -gno-record-gcc-switches -fno-common"
GOROOT/bin/go version: go version devel +9a13f8e11c Tue Nov 28 06:47:50 2017 +0000 darwin/amd64
GOROOT/bin/go tool compile -V: compile version devel +9a13f8e11c Tue Nov 28 06:47:50 2017 +0000
uname -v: Darwin Kernel Version 17.2.0: Fri Sep 29 18:27:05 PDT 2017; root:xnu-4570.20.62~3/RELEASE_X86_64
ProductName:    Mac OS X
ProductVersion: 10.13.1
BuildVersion:   17B48
lldb --version: lldb-900.0.57
  Swift-4.0

https://tools.ietf.org/html/rfc3986#section-3.4 states that query component should be defined as (appendix A):

  query       = *( pchar / "/" / "?" )
  pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"
  unreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~"
  pct-encoded   = "%" HEXDIG HEXDIG
  sub-delims    = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

There's no whitespace character in this list. whatwg agrees on that:

A URL-query string must be zero or more URL units.

[...]

The URL units are URL code points and percent-encoded bytes.

[...]

The URL code points are ASCII alphanumeric, U+0021 (!), U+0024 ($), U+0026 (&), U+0027 ('), U+0028 LEFT PARENTHESIS, U+0029 RIGHT PARENTHESIS, U+002A (*), U+002B (+), U+002C (,), U+002D (-), U+002E (.), U+002F (/), U+003A (:), U+003B (;), U+003D (=), U+003F (?), U+0040 (@), U+005F (_), U+007E (~), and code points in the range U+00A0 to U+10FFFD, inclusive, excluding surrogates and noncharacters.

namusyaka commented 6 years ago

note: This seems to be related to this line, and its behavior looks reasonable.

namusyaka commented 6 years ago

I have investigated the issue and confirmed that this comment is incorrect, or at least the current (*URL) String() implementation is wrong. In fact, query components aren't escaped at all, not only in this example. If the comment is positive, I think we should escape the query component appropriately. /cc @tombergan

fraenkel commented 6 years ago

@namusyaka I think you are mistaken. The problem is with all forms of Parse which does no validation on the query or fragment pieces. The parse of the uri should have failed. Since it does not, the Fragment or RawQuery fields are incorrect and hence String() which is doing the right thing appears to provide a bad url.

All that has to happen is to apply validation of the query and fragment pieces in parse, and that should solve the issues. It might start breaking code that is sending around "bad" URLs.

fraenkel commented 6 years ago

Its not clear that this can be fixed given the behavior of shouldEscape() which states that most of the sub-delims should escape.

namusyaka commented 6 years ago

@fraenkel Yes, I knew it's more easy to understand the behavior. However, if your opinion will reflect to the implementation, another cases such as http://example.com/foo bar, http://example.com/foo? bar and http://example.com/foo?bar# baz should be failed on parse, and then I'm worried that breaks backword-compatibility.

gopherbot commented 6 years ago

Change https://golang.org/cl/99135 mentions this issue: net/url: reject invalid query strings when parsing URLs.

bradfitz commented 6 years ago

Copying my comment from the CL:

I changed the commit message to accurately reflect what later versions of this CL did. It now says:

net/url: escape URL.RawQuery on Parse if it contains invalid characters

But while reviewing this, I'm worried it might change the behavior of:

https://golang.org/pkg/net/url/#URL.Query which says:

"It silently discards malformed value pairs. To check errors use ParseQuery."

Before it silently discarded things, and with this CL it would start returning escaped versions instead. Do we care?


I think we probably should wait for Go 1.12 at this point. This needs a decision on what to do.

bradfitz commented 6 years ago

Reopening, as the fix is being reverted.

gopherbot commented 6 years ago

Change https://golang.org/cl/137716 mentions this issue: Revert "net/url: escape URL.RawQuery on Parse if it contains invalid characters"

agnivade commented 5 years ago

@bradfitz - Any thoughts on this ? Or do we want to push to 1.13 ?

bradfitz commented 5 years ago

I'm kinda done with net/url.URL changes. They're always problematic compatibility-wise.

Definitely not for Go 1.12.

gopherbot commented 5 years ago

Change https://golang.org/cl/159157 mentions this issue: net/url, net/http: reject control characters in URLs

FiloSottile commented 5 years ago

@gopherbot please open backport issues.

This has security implications, and CL 159157 is safe enough to backport.

gopherbot commented 5 years ago

Backport issue(s) opened: #29922 (for 1.10), #29923 (for 1.11).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

gopherbot commented 5 years ago

Change https://golang.org/cl/159478 mentions this issue: [release-branch.go1.10] net/url, net/http: reject control characters in URLs

gopherbot commented 5 years ago

Change https://golang.org/cl/160178 mentions this issue: net/url, net/http: relax CTL-in-URL validation to only ASCII CTLs

gopherbot commented 5 years ago

Change https://golang.org/cl/160678 mentions this issue: [release-branch.go1.10] net/http, net/url: reject control characters in URLs

gopherbot commented 5 years ago

Change https://golang.org/cl/160798 mentions this issue: [release-branch.go1.11] net/http, net/url: reject control characters in URLs

ianlancetaylor commented 5 years ago

Is there anything else to do for this issue?

gopherbot commented 5 years ago

Change https://golang.org/cl/162960 mentions this issue: doc/go1.12: document net/url.Parse now rejecting ASCII CTLs

gopherbot commented 5 years ago

Change https://golang.org/cl/162826 mentions this issue: [release-branch.go1.12] doc/go1.12: document net/url.Parse now rejecting ASCII CTLs