golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.54k stars 17.6k forks source link

net/url: `unescape()` logic doesn't copy invalid bytes following % as expected by most recent spec #56732

Open dgryski opened 1 year ago

dgryski commented 1 year ago

(This is a reopened version of https://github.com/golang/go/issues/11249 because the whatwg spec has changed.)

What version of Go are you using (go version)?

Go 1.19

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

All.

What did you do?

Tried to parse a URL with an "invalid" percent escape. (In this case, %u0041).

    u, err := url.ParseRequestURI("http://localhost/%u0041")

Playground link: https://go.dev/play/p/SqA0_6oLuo3

What did you expect to see?

Printing the URL path as /%u0041

What did you see instead?

That there was an error parsing the URL.

I'm reopening this bug because the current version of the URL spec for "Percent Encoded Bytes" ( https://url.spec.whatwg.org/#percent-encoded-bytes )

Otherwise, if byte is 0x25 (%) and the next two bytes after byte in input are not in the ranges 0x30 (0) to 0x39 (9), 0x41 (A) to 0x46 (F), and 0x61 (a) to 0x66 (f), all inclusive, append byte to output.

The JavaScript and popular Rust parsers follow this model, which is making interoperability tricky.

JavaScript:

> new URL("http://localhost/%u0041").pathname
< '/%u0041'

Rust:

use url::Url;
use hyper::Uri;

fn main() {
    let uri = "https://localhost/%u0041".parse::<Uri>().unwrap();
    println!("path: {}", uri.path());

    let u = Url::parse("https://localhost/%u0041").unwrap();
    println!("path: {}", u.path());
}

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=f1134169d2851ebe07da4a45e03ff68e

gopherbot commented 1 year ago

Change https://go.dev/cl/450375 mentions this issue: net/url: accept invalid percent encodings

dgryski commented 1 year ago

Thanks Ian.

liggitt commented 1 year ago

Doesn't loosening of URI validation risk propagating invalid %-encodings to other systems that will choke on the invalid URIs (like, for example, something built with go1.19)? This seems opposite to the logic used to justify tightening of ParseIP logic in go1.17 (https://github.com/golang/go/issues/30999).

I would not have expected the stdlib method to relax validation unconditionally like this

liggitt commented 1 year ago

aside from the question about whether allowing these invalid sequences is going to cause problems with other systems, stdlib methods are widely used to validate things like IPs and URIs which are persisted and have to interoperate with multiple go versions. If this is going to update go1.20 to accept URIs as valid which go1.19 rejected as invalid, can this also include guidance for consumers to roll out this change?

  1. For go1.20 users that want to keep validating things strictly to disallow invalid %-encodings, how should they change their code?
  2. For go1.19 users updating to build with go1.20 who want to preserve existing runtime behavior with no code changes, how can they accomplish that? xref https://github.com/golang/go/discussions/55090 (cc @rsc)
aojea commented 1 year ago

if my "git archeology" is correct, the whatwg spec has changed onJan 14, 2014,

https://github.com/whatwg/url/commit/3cfaa1779bfb9a3ba2b907a5802ca0251ca9a7e6

that predates https://github.com/golang/go/issues/11249 , and makes arguable to change the behavior on the parsers at this point, of course, I'm biased due to the implications of this change in kubernetes/kubernetes :)

dgryski commented 1 year ago

I guess there are two issues conflated in the earlier bug. The first is "Please support decoding %u0000 as Unicode" (which I am not in favour of) and the second is "%u shouldn't appear in URLs" (which the current spec allows and provides handling logic for).

Given that 1) there are sites in the wild with %u in their URLs and 2) other languages support the spec and thus those URLs, I believe that net/url should too.

dims commented 1 year ago

Do we want to open another issue as this is closed?

liggitt commented 1 year ago

Do we want to open another issue as this is closed?

Yeah, I'll open one on Monday

liggitt commented 1 year ago

opened https://github.com/golang/go/issues/56884

gopherbot commented 1 year ago

Change https://go.dev/cl/452795 mentions this issue: Revert "net/url, net/http/httputil: accept invalid percent encodings"

ianlancetaylor commented 1 year ago

CL is being reverted.

odeke-em commented 10 months ago

Moving this to Go1.23 as it is quite late in the cycle.