golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.57k stars 17.61k forks source link

net/url: don't parse ';' as a separator in query string [freeze exception] #25192

Closed zhyale closed 3 years ago

zhyale commented 6 years ago

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

go1.10.1 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

CentOS 7 with kernel 3.10.0-514.10.2.el7.x86_64

What did you do?

I create a program to detect attack such as SQL Injection, when test case is: http://..../testcase?id=1%27;--

and I use: r.ParseForm() params := r.Form fmt.Println("params:", params, "count:", len(params)) for key, values := range params { fmt.Println("param", key, ":", values) }

Got: params: map[--:[] id:[1']] count: 2 param id : [1'] param -- : []

What did you expect to see?

expect only one expression in this case: key: id value: 1';--

What did you see instead?

I got two key:[value] pairs.

agnivade commented 6 years ago

Complete repro -

package main

import (
    "fmt"
    "net/http"
)

func main() {
    http.HandleFunc("/foo", func(w http.ResponseWriter, req *http.Request) {
        err := req.ParseForm()
        if err != nil {
            fmt.Println(err)
            w.Write([]byte("error"))
            return
        }
        params := req.Form
        fmt.Println("params:", params, "count:", len(params))
        for key, values := range params {
            fmt.Println("param", key, ":", values)
        }
        w.Write([]byte("OK"))
    })

    fmt.Println("starting on port 9999")
    server := &http.Server{
        Addr: ":9999",
    }
    server.ListenAndServe()
}

curl 'http://localhost:9999/foo?id=1%27;--'

@bradfitz - Is this expected ?

fraenkel commented 6 years ago

Both & and ; are used to split key value pairs. The ; is optional since it allows you to provide a URL as a value with a query string. See https://en.wikipedia.org/wiki/Query_string under Web Forms

odeke-em commented 6 years ago

Hello @zhyale, thank you for filing this issue and welcome to the Go project!

@agnivade and @fraenkel thank you for the responses too.

So I believe, the root cause of the question here is rather: Why is a semi colon being used as a separator in url.Parse?

If you run https://play.golang.org/p/QVz18jWspPF or inlined below:

package main

import (
    "log"
    "net/url"
)

func main() {
    log.SetFlags(0)
    u, err := url.Parse("http://localhost:9999/foo?id=1%27;--")
    if err != nil {
        log.Fatalf("Failed to parse URL: %v", err)
    }
    log.Printf("%#v\n", u.Query())
}

You'll see the real symptom

url.Values{"--":[]string{""}, "id":[]string{"1'"}}

A W3C recommendation https://www.w3.org/TR/html401/appendix/notes.html#h-B.2.2 recommended using ; as a separator for url-encoded params for form-data

screen shot 2018-05-01 at 5 08 58 am

and that was what we used to add ; as a separator, making ';' synoymous with '&' in https://golang.org/cl/4973062 https://github.com/golang/go/issues/2210#issue-51278892

However, unfortunately that recommendation just got superseded very recently in mid-December 2017 by https://www.w3.org/TR/2017/REC-html52-20171214/

and now that points to https://url.spec.whatwg.org/#urlsearchparams in which we can see that the only separator here is '&'

screen shot 2018-05-01 at 5 19 53 am

Node.js doesn't seem to recognize ';' as a separator

url.parse('http://localhost:9999/foo?id=1%27;--', true);
Url {
  protocol: 'http:',
  slashes: true,
  auth: null,
  host: 'localhost:9999',
  port: '9999',
  hostname: 'localhost',
  hash: null,
  search: '?id=1%27;--',
  query: { id: '1\';--' },
  pathname: '/foo',
  path: '/foo?id=1%27;--',
  href: 'http://localhost:9999/foo?id=1%27;--' }

and the old survey of what other languages do while technically still correct per https://github.com/golang/go/issues/2210#issuecomment-66058500, any future adoptions of the new recommendation from W3C will mean that those languages will also change their behavior.

Now the big question is: this behavior has been around since 2011, changing it 7 years later might massively break code for many users. Perhaps we need a survey of the make-up and interpretation of query strings from some frontend server?

Also this perhaps will need some security considerations as well or maybe a wide scale adoption of the recommendation first?

In addition to those already paged, I will page some more folks to help pitch in thoughts about the fate of the new W3C recommendation /cc @ianlancetaylor @andybons @tombergan @agl @rsc @mikesamuel

bradfitz commented 6 years ago

The Go 1 docs say:

Query is expected to be a list of key=value settings separated by ampersands or semicolons.

So we can't change it during Go 1.x.

Repurposing this to be a Go 2.x issue.

mikesamuel commented 6 years ago

IIUC, the problem is that

Array.from(new URL(`https://a/?a&b;c`).searchParams.keys())
// [ "a", "b;c" ]

but

values, _ := url.ParseQuery(`a&b;c`)
fmt.Println(reflect.ValueOf(values).MapKeys())  // [c a b]

Assuming that's right:

Now the big question is: this behavior has been around since 2011, changing it 7 years later might massively break code for many users. Perhaps we need a survey of the make-up and interpretation of query strings from some frontend server?

+1

We might be able to get a handle on the size of potential breakage by surveying URL composition libraries to get an idea of how many sources are likely to produce URLs that are substantially different given non-crafted inputs.

For example:

// html/template
t, _ := template.New("T").Parse(`<a href="/?a={{.}}&b=b">`)
t.Execute(&out, `foo;bar&baz`)
out.String() == `<a href="/?a=foo%3bbar%26baz&b=b">`

// net/url
url.QueryEscape(`foo;bar&baz`) == `foo%3Bbar%26baz`
url.QueryUnescape(`foo%3Bbar%26baz`) == `foo;bar&baz`

In vanilla JS

encodeURIComponent(`foo;bar&baz`) == `foo%3Bbar%26baz`
encodeURI(`foo;bar&baz`) == `foo;bar&baz`

I think this is mostly a correctness and interop issue but that the security consequences are not severe.

That said, there are risks to inferring more structure than the spec allows.

If a survey finds that there are widely used url.QueryEscape / encodeURIComponent counterparts that escape & but do not escape ;, and a trusted partner composes URLs thus:

goServiceUrl = "//example.com/?"
    + "securityIrrelevantParam=" + laxEscapeQuery(untrustedString)
    + "&importantParam=" + laxEscapeQuery(trustedString)

then untrustedString might inject a ;importantParam which would be available as the zero-th entry in Go's query map.

The risk is pretty small as long as laxEscapeQuery escapes '=' and one workaround is to consistently enforce has-exactly-one constraints.

bigluck commented 5 years ago

Hi everybody, I'm having the same problem by using AWS API Gateway in front of my application. When the proxed endpoint is configured with the HTTP Proxy integration configuration on, AWS API Gateway transform a valid request like this one:

GET /service?cid=002125b3-26bd-48c4-ac90-9683d804be2d&tm=1532974971748&ua=Mozilla%2F5.0%20(Macintosh%3B%20Intel%20Mac%20OS%20X%2010_13_6)%20AppleWebKit%2F537.36%20(KHTML%2C%20like%20Gecko)%20Chrome%2F67.0.3396.99%20Safari%2F537.36&z=b075dd02-77f8-44bb-a389-866dcc80c9b6 HTTP/1.1

into:

GET /service?cid=002125b3-26bd-48c4-ac90-9683d804be2d&tm=1532974971748&ua=Mozilla/5.0+(Macintosh;+Intel+Mac+OS+X+10_13_6)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/67.0.3396.99+Safari/537.36&z=b075dd02-77f8-44bb-a389-866dcc80c9b6 HTTP/1.1

And Golang parse the ua parameter into Mozilla/5.0 (Macintosh

Here a Go Playground to test it: https://play.golang.org/p/BeKy_UuSyoO

FiloSottile commented 3 years ago

It was recently pointed out that this behavior divergence can lead to cache poisoning attacks in reverse proxies that cache based on a subset of query parameters: if Go treats https://example.com/?x;id=1337&id=123 as a request for ID 1337, while the cache treats it as a request for ID 123, the cache can be made to serve the wrong response.

At the risk of repeating myself, I want to point out that relying on parser alignment for security is doomed, so this will break again, and often in subtler ways that we can't fix universally. However, this is probably broken most of the time now, so we should fix it.

I did a quick survey of other popular languages and frameworks.

A more interesting question would be how caching proxies behave, but a lot of them are ad-hoc and hard to test. Snyk seems to think at least Varnish uses only &. As pointed out in https://github.com/golang/go/issues/25192#issuecomment-432369818, AWS is not only ignoring ;, but it's unescaping it while proxying.

I think the W3C and WHATWG recommendations are sort of red herrings: they are about x-www-form-urlencoded, not URL queries. On the other hand, the entire reason anyone supported ; is that it was more convenient to use for x-www-form-urlencoded forms, as it did not require escaping & in HTML.

Given Rails and Python are waiting for the next scheduled release to fix this, we will too, and ship the fix in Go 1.17. It would be too harsh of a behavior change for too vague of a security benefit to land in a security release anyway.

This is technically a backwards compatibility promise violation, but I am invoking the security exception. More broadly, we should think whether the thing we want to provide compatibility with is "the exact self-contained behavior of the library when it was introduced" or "correctly interoperate with the current ecosystem". Realistically, we're already doing more of the latter with crypto/tls and crypto/x509, because doing the exact same thing as 5 years ago would be useless and insecure. HTTP and web specifications are similarly living and evolving, and we should figure out what our policy about that is.

rsc commented 3 years ago

It seems wrong that every single HTTP server implementation in the world is being charged with coping with a decision made by a few proxies. If the proxy doesn't think ; is special, then it could easily enforce that view by escaping the ; as it passes the request through. And if it did think ; was special, it could enforce that by rewriting to &. Then just a few proxies would need changing instead of all the HTTP server implementations in the world.

The fact that we're having this discussion and not the nginx developers is utterly backward. (And AWS unescaping the ; is even more backward.)

Anyway, that rant aside, if we do make this change, can we find some way to make it not a silent change? I worry about people updating to Go 1.17 and having their servers misbehave but with no idea why. Maybe a logging print for each request that ignores ; as separator, limited to one per second or something like that?

It also seems like a hard requirement to be able to opt back in to the old behavior. Users with client apps sending ; that don't have the misfortune of being behind nginx will not like being forced to rewrite and redeploy their apps just to get Go 1.17.

rsc commented 3 years ago

Adding to the minutes. It's getting a bit late to make a change in Go 1.17 so we should try to converge on a plan soon.

@bigluck, if you still use the AWS API Gateway, does it still unescape semicolons? Or does anyone else know?

rsc commented 3 years ago

@ianlancetaylor found private mail saying that https://angular.io/api/common/http/HttpUrlEncodingCodec did not escape semicolons at least in 2019, leading to confusion as well. Does anyone know if that is still the case?

And does anyone know of any clients that do use semicolons for separators and would break if we stopped recognizing them?

rsc commented 3 years ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

rsc commented 3 years ago

I am not convinced we should make a breaking change here.

1. There are many many implementations of this behavior.

HTML 4 introduced web forms. Appendix B, section B.2.2 reads:

The URI that is constructed when a form is submitted may be used as an anchor-style link (e.g., the href attribute for the A element). Unfortunately, the use of the "&" character to separate form fields interacts with its use in SGML attribute values to delimit character entity references. For example, to use the URI "http://host/?x=1&y=2" as a linking URI, it must be written <A href="http://host/?x=1&#38;y=2"> or <A href="http://host/?x=1&amp;y=2">.

We recommend that HTTP server implementors, and in particular, CGI implementors support the use of ";" in place of "&" to save authors the trouble of escaping "&" characters in this manner.

Go, like Python, Ruby, and other languages, implemented this recommendation. I can't find anything saying that W3C or WHATWG has explicitly retracted this. (The W3C link in the Snyk post is about POST form parsing, not query parameters. And the Snyk post also later refers to “the RFC” but seems to mean that W3C link again, not an actual RFC. Snyk's being sloppy, and sloppy mixed with security is never good.)

2. Making this change will break Go users.

I remain concerned about breaking existing users. Python deciding to make the change does not seem like a compelling argument to me. Go takes a much stronger view of backwards compatibility than Python does (for example, Python 3).

There is no doubt that at least some hand-written HTML containing links that will break when semicolon is removed. We would need a good reason to break those users.

The only acceptable reason here is security. But this is the wrong way to improve security of Nginx configurations.

3. Making this change is not the best fix for the web or nginx ecosystem.

It is clear that there are many URL query parsers that implemented the HTML 4 recommendations. If caching proxies like Nginx want to work correctly, they should be guarding against this problem, not every possible server sitting behind them. The fix can be made in one place (the proxy) instead of a very large number of places.

URL query parameter parsing turns out to be ambiguous, because we don't know whether a particular implementation will split on semicolons. The caching proxy is the first thing that processes the URL. If it interprets the query parameters a particular way, then it should rewrite them to have that meaning unambiguously. So if a proxy receives GET /search?x=1;y=2, it should, since it knows its own interpretation might differ from the back end, rewrite that to either GET /search?x=1%3By=2 or GET /search?x=1&y=2 depending on how it interpreted the semicolon. That's an easy fix that guarantees everything downstream is unaffected. That's clearly the right fix.

The Snyk post says:

Be aware of the cache key: If your server splits query arguments using a semicolon, make sure your cache proxy does the same. Furthermore, make sure your cache key contains the necessary headers to prevent attackers from using unkeyed parameters to achieve web cache poisoning.

This is good advice. It boils down to "Make sure your server and your cache proxy agree about how to split query arguments." The simplest, most guaranteed way to do this is to have the proxy rewrite the query to be unambiguous.

4. Go servers behind nginx are not affected by default.

The nginx docs say the default proxy cache key is $scheme$proxy_host$request_uri. That does not interpret the query parameters at all, so it has no problem with semicolons versus ampersands.

The way to get into trouble is to use a key like $arg_x which extracts the x parameter from the query string and does so without considering semicolons (the code is ngx_http_arg called from ngx_http_variable_argument in the nginx code base; there is no configuration to allow semicolons). Go users, and Python users, and Ruby users, and many others, cannot sync the nginx behavior and their server's behavior while also using query parameter cache keys. But again, these keys are not used by default. Only users who use these keys need to make a change.

5. A potential non-breaking Go change.

To not break users, we could add a url.AllowQuerySemicolon function that controls whether ParseQuery accepts semicolons. It can default to true, but users who want to use nginx's query argument cache keys would call AllowQuerySemicolon(false) to bring the Go implementation in line with nginx's.

It is not clear to me that we have enough justification to take the next step of changing the default and breaking users, all for the very few users who are using nginx and setting a custom cache key using query argument parameters. Again, this does not affect all Go programs sitting behind nginx. It only affects those who set a custom proxy_cache_key and use a $arg_xxx form.

In the long term, it still seems like nginx should rewrite the query strings to escape semicolons in requests that are consulting the proxy cache and using query parameter argument keys, ensuring that nginx's understanding of the request URL matches the server's, 100% of the time, no matter what the server is.

FiloSottile commented 3 years ago

I think focusing on nginx is optimistic. nginx happens to be a popular reverse proxy that exhibits this behavior, but I doubt it's the only one. We should check Cloudflare, Fastly, Akamai, AWS, GCP, Azure, Varnish, F5, and Apache, off the top of my head. If they all need to change, then changing the server frameworks is not "changing the thing in a lot of places instead of in one place" (which I generally agree with as an argument).

At that point, I think it becomes a matter of what's more expected and predictable. The HTML 4 spec is hardly authoritative for URL query strings in HTTP, so we just don't have an easy reference here. As of today, I found the ; surprising. If I were to write or audit a regex-based cache key, I would not have flagged the lack of semicolon support, nor would I have requested that the regex be wired into the rewriting engine (which is usually not possible). Now that Python, PHP, and Rails are switching to the better-known behavior, the situation will get worse over time, not better. In five years, we'll be the odd one out, and it will be hard to convince all new reverse proxies to do query string mangling to support Go's bizarre URL parsing.

If you are not convinced, I think we should check with all those other reverse proxy vendors before declining this proposal.

rsc commented 3 years ago

It sounds like the argument is that "Go should break all the semicolon users because all the others are too". And maybe that's true. What we don't really know is how many semicolon users there are.

Again, this does not happen in reverse proxies out of the box, only when you start trying to distinguish by query parameters.

rsc commented 3 years ago

Twitter threads gathering info: https://twitter.com/bradfitz/status/1374774763059482624 https://twitter.com/FiloSottile/status/1374777816139628545

joneskoo commented 3 years ago

Many on the Twitter threads confusing this with semicolons in path section. That’s not what we’re talking about here, right?

not in scope: https://example.com/a;b=c;foo=42

In scope: https://example.com/a?b=c;foo=42

notably JSESSIONID mentioned a few times.

https://play.golang.org/p/OUVtI15HkU3

AGWA commented 3 years ago

I'm a semicolon user. I like the ergonomic benefits from not having to escape & in HTML.

The backwards compatibility break for this would be really nasty. Unlike removing TLS ciphers or support for common name, this won't result in a predictable error. Instead the program will behave differently, and a log message is cold comfort if the program has already taken some undesirable action based on misinterpreting a query string. The bar for such a compatibility break should be very high, and I don't think this proposal should qualify. Parsing URLs is a minefield and practically every implementation does something differently. Any proxy server which relies on parsing query strings in the same manner as their backends is insecure, and removing semicolon separators from Go doesn't change that.

To provide an example of how futile this effort is, nginx's query string parser (ngx_http_arg) differs from Go's in the following three additional ways:

nginx behavior Example Go value for id nginx value for id
Keys are case-insensitive iD=1&id=2 2 1
Keys are not unescaped %69d=1&id=2 1 2
Keys without equal signs are not supported id&id=1 (empty) 1

Consequentially, this proposal would not help nginx users at all, and I would not be surprised if other reverse proxies also deviated from Go in at least one of the above ways.

I realize that there aren't many semicolon users (@FiloSottile's Twitter poll has been bouncing between 1.5%-2%), but it strikes me as pretty inequitable to force us to deal with a change in application behavior whose only purpose is to try to paper over other people's risky behavior.

rsc commented 3 years ago

@joneskoo, you are correct: semicolons before a ? are unrelated to this issue.

rsc commented 3 years ago

@AGWA, indeed. A major problem with this change is that it will be a reason users can't easily update to a newer version of Go, which will increase our support burden as users lag behind. By itself, it may not seem like much to break a small fraction of the user base, but over time that kind of thing adds up.

Thanks very much for the additional nginx examples. I think these support the idea that any kind of proxy caching that tries to strip "unnecessary" parts of the URL really needs to canonicalize the URL to enforce its interpretation before passing the query along. That is, there may be a real problem here, but the fix is being applied in the wrong place. Even if we "fixed" Go to match nginx in all these cases, it would still not match other caching proxies.

Again, this does not happen in reverse proxies out of the box, only when you start trying to filter "unimportant" parts of the query string. It's starting to sound like you should basically never do that, and reverse proxies should remove that feature entirely.

FiloSottile commented 3 years ago

Until @RReverser's tweet I had somehow completely missed that the URL WHATWG spec actually does explicitly say to use the application/x-www-form-urlencoded parser for URL query parsing. That parser in the current revision of the spec only uses &. The latest HTML WHATWG spec also refers to the URL WHATWG parser.

https://url.spec.whatwg.org/#ref-for-concept-urlencoded-string-parser https://url.spec.whatwg.org/#urlencoded-parsing https://html.spec.whatwg.org/#resolving-urls

Choosing to keep supporting ; would not just be picking the less popular option in an under-specified setting, but would be going off-spec, while all other implementations are converging on the current specified behavior.

@AGWA's analysis is definitely insightful, and I really don't disagree that security-critical parser alignment is a losing battle. (I literally have a [[Parser alignment]] page in my notes to collect examples.) However, the URL WHATWG spec is very precise, and it's clear that nginx is wrong in all three of those cases: keys are case-sensitive and percent-decoded, and values are empty when missing.

I realize that there aren't many semicolon users (@FiloSottile's Twitter poll has been bouncing between 1.5%-2%)

1.5% (even accounting for people confused about semicolons in paths) is not small. However, I am worried about the 75% that didn't know about ; at all. All those developers will be surprised by our off-spec behavior, increasingly so as time passes and the old specs are forgotten, all other frameworks change behavior, and ; becomes a Go exclusive quirk. Surprising those developers, whether or not it has security implications, is also pretty inequitable to support the 1.5%.

It's starting to sound like you should basically never do that, and reverse proxies should remove that feature entirely.

This is not practical, alas. Not for any reason I like, to be clear: it's mostly tracking parameters. But they are very much a thing and "don't get to your business goals because parser alignment is impossible (and anyway we don't like your practices)" is not a winning argument.

Taking a step back, it sounds like there are two ways out of this as an ecosystem:

  1. we convince all reverse proxies to change the semantics of cache keys so that they rewrite the upstream URL to a canonicalized representation, on the argument that it's safer and more robust, and that they should mitigate our exclusive off-spec behavior
  2. we converge on the specified behavior, filing issues with reverse proxies that depart from the spec on the argument that they are off-spec and introducing a vulnerability

I am not a fan of (2) at all, but (1) doesn't sound like it will happen, and this format is unusually simple and well-specified, so it might just be possible to match the spec.

rsc commented 3 years ago

I still can't escape the feeling this is all security theater, especially given what @AGWA pointed out. The suggestion is to break existing Go users (we don't know how many, but greater than zero). And the benefit is ... unclear beyond speculation about possibly being one step in a much longer path involving many players toward converging on shared behavior.

I just don't yet see how the benefits outweigh the costs.

Also the linked WHATWG spec is about JavaScript behavior specifically. (The rest of the spec is not, but the parsing of the query is only relevant to constructing a JavaScript object.)

I'm sorry for being a pain about this, but breaking users is a big deal and we need to be sure it is worth it. And that evidence is absent so far. Starting from the very beginning with Snyk's sloppy blog post, there have been too many fuzzy claims that look different when you get up close and really examine them.

slrz commented 3 years ago

1.5% (even accounting for people confused about semicolons in paths) is not small. However, I am worried about the 75% that didn't know about ; at all. All those developers will be surprised by our off-spec behavior, increasingly so as time passes and the old specs are forgotten, all other frameworks change behavior, and ; becomes a Go exclusive quirk. Surprising those developers, whether or not it has security implications, is also pretty inequitable to support the 1.5%.

If other implementations go ahead with this change it will probably cause existing users to move away from their semicolon usage. Let enough time pass and it might be rare enough for Go to follow without annoying too many users.

Now, leaning back and watching while others take flak for breaking users may not be the nicest thing to do if the change in question is absolutely necessary for the ecosystem's continued health. In this case though, I think it's appropriate given that Python, Rails and friends don't have a strong commitment to compatibility in the first place (unlike Go) and the proposed change doesn't seem all that urgent to me.

rsc commented 3 years ago

Let enough time pass and it might be rare enough for Go to follow without annoying too many users.

It will probably annoy the same number as it would today, just perhaps a smaller percentage.

beoran commented 3 years ago

The simplest backwards compatible option is to add a url.AllowQuerySemicolon(bool) function, as rsc suggested. Not to +1, but what is there /against/ that suggestion? Whether it should default to true or false is debatable, but either way, letting the Go users have a knob to keep backwards compatibility seems desirable.

Mouvedia commented 3 years ago
  1. option default to true
  2. option default to false
  3. deprecation notice
  4. removal
rsc commented 3 years ago

Filippo and I spent an hour discussing this. The problem today is a “superset” attack, where Go sees a superset of the values that the proxy does for:

url?x=y;id=evil&id=good

In this case, the proxy thinks the id is good but Go thinks the id is evil, resulting in the evil entry being cached as good.

Go's current parser also leads to a “subset” situation, where Go sees a subset of the values that the proxy does. In particular, Go rejects (drops) keys and values % followed by non-two-hex-digits, but the most common code paths don't check the error, so in

url?id=evil%&id=good

Go sees good while other implementations may see both evil% and good. We are unaware of any attack on this “subset” situation.

Because of that, we propose to do the following.

  1. Make url.ParseQuery split on & but then reject key=value pairs containing semicolons.
  2. Make package http log about ParseQuery errors mentioning the semicolon problem.

The effect will be to make the semicolon-preceded or -terminated pairs disappear, turning the superset attack into a subset situation. Because we are unaware of any subset-based attack, this is an improvement.

The log message should make clear what is going on for users who are confused about no longer seeing certain key-value pairs in their apps.

rsc commented 3 years ago

Since there seem to be no objections to the proposal in the previous comment, I am going to mark this likely accept. The final acceptance won't happen until after the freeze, but we still would like to get this in for Go 1.17 for security reasons.

rsc commented 3 years ago

Based on the discussion above, this proposal seems like a likely accept. — rsc for the proposal review group

antichris commented 3 years ago

Because of that, we propose to do the following.

  1. Make url.ParseQuery split on & but then reject key=value pairs containing semicolons.
  2. Make package http log about ParseQuery errors mentioning the semicolon problem.

The effect will be to make the semicolon-preceded or -terminated pairs disappear, turning the superset attack into a subset situation. Because we are unaware of any subset-based attack, this is an improvement.

; is the more reasonable choice and (unrealistic as it sounds, especially when everyone's been using &amp;s out of pure force of habit for decades now and some are even actually moving to drop ; support entirely) should actually be the preferred separator. So, if the idea is to drop the support for splitting on semicolons unconditionally, I'm against it.

It's not like Go can flip the webdev world now, dominated by javascript kiddies and wordpress "engineers", but we can at least keep the saner choice an opt-in for those who can appreciate (and properly set their infrastructure up around) it. In other words, what https://github.com/golang/go/issues/25192#issuecomment-820166005 said, just really do default to false.

Making this a knob right now would not preclude the option to reduce the maintenance burden by dropping the support entirely sometime in the future when it is obvious that no one knows or cares about using semicolons as their URI parameter separators.

FiloSottile commented 3 years ago

It's not like Go can flip the webdev world now, dominated by javascript kiddies and wordpress "engineers"

@antichris, this remark is not consistent with the Gopher values. The Javascript world built many things, and enabled even more people to build things. We want Go to be a tool that works with that ecosystem, and to be welcoming of people that come from that community.

The only technical argument in favor of using semicolons is that it's easier to insert them in manually composed HTML. If you have more please do contribute them. Ecosystem consistency is also a technical argument, and that's what we are weighting it against here.

Knobs are extra complexity, but if you have a proposal for where to insert that knob we'll listen and weight it against the benefit. url.AllowQuerySemicolon(bool) is not an option, because package-global state would lead to conflicts between libraries that expect and set the value to be different.

AGWA commented 3 years ago

I don't think there needs to be a knob for this, as it's easy to write a middleware handler that replaces ; with & in the query string, e.g.:

func ReplaceSemicolons(h http.Handler) http.Handler {
        return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
                if r.URL.RawQuery != "" {
                        r2 := new(http.Request)
                        *r2 = *r
                        r2.URL = new(url.URL)
                        *r2.URL = *r.URL
                        r2.URL.RawQuery = strings.ReplaceAll(r.URL.RawQuery, ";", "&")
                        h.ServeHTTP(w, r2)
                } else {
                        h.ServeHTTP(w, r)
                }
        })
}

Something like this ought to be put in the release notes or even the standard library. It's not easy to migrate an application away from semicolon separators, as it's hard to grep for the problem and there might be links elsewhere on the Web that can't be broken. If it's not easy for semicolon users to continue using semicolons, I suspect many will be prevented from upgrading to Go 1.17.

antichris commented 3 years ago
[Opinionated rant] I agree that Go should be fit for use in as many ecosystems as possible. And you're right about that part of the sentence you quoted dripping in my resentment towards those who work tirelessly on reducing the signal-to-noise ratio. There is nothing more I want for code monkeys than to be finally able to graduate to engineers. I'm really not sure about the validity of that ecosystem consistency argument though: if you were residing in a region where the rest of the population consistently painted their faces with human excrement, would you adopt this practice just for the sake of consistency? It's an extreme example, but reflects how I feel about the ecosystem consistency debate in this context where there is no benefit, only drawbacks, to the approach that everyone sticks to, but they do it just because everyone else is doing it, in a runaway feedback loop.

Semicolons as separators reduce HTML payload generation and parsing complexity and resource use in terms of processing time, memory and bandwidth consumption in humans and machines alike. But I suppose you could say that's a microoptimization that no one should care about.

Most importantly, you're right — it somehow entirely got lost on me that what we're discussing calls for changes in url, which is a static package, and I can't think of a way to fix that without a massive redesign of huge chunks of the standard library.

andrius4669 commented 3 years ago

somehow people are forgetting that it's often not necessary at all to escape & in case of url parameter usage, because the usual &x=y has very little chance to match &<something>; sequence for character references, unless parameter key contains ; AND character reference is actually meaningful AND it's not escaped for some reason (url.QueryEscape does escape ;).

edit: apparently there are some legacy character references which don't need to end with ; (reference), so I was too optimistic about this. it's quite a minefield.

edit2: except they only apply when I use them in usual text, but when I use them in <a href=" on firefox they get parsed as parameters??? this seems actually "fine" but quite complicated indeed.

rsc commented 3 years ago

@AGWA's handler converter is nice for opt-in use. I filed #45973 for that.

rsc commented 3 years ago

No change in consensus, so accepted. 🎉 This issue now tracks the work of implementing the proposal. — rsc for the proposal review group

FiloSottile commented 3 years ago

I'd like to request a freeze exception for this fix. We deemed it not a backportable security fix (also because of the potential for disruption and the limited real-world impact), but it's conceivable that it could have security value for some applications, so we'd like not to wait for Go 1.18.

Russ, should we also land AllowQuerySemicolons (#45973) during the freeze? I'm leaning towards yes.

/cc @golang/release @rsc

gopherbot commented 3 years ago

Change https://golang.org/cl/325697 mentions this issue: net/url: reject query values with semicolons

dmitshur commented 3 years ago

Thank you for making this freeze exception request. It is approved.

yktoo commented 2 years ago

With the change https://golang.org/cl/325697 in place our servers now produce lots of log lines like

2021/12/13 08:44:51 http: URL query contains semicolon, which is no longer a supported separator; parts of the query may be stripped when parsed; see golang.org/issue/25192
2021/12/13 08:44:51 http: URL query contains semicolon, which is no longer a supported separator; parts of the query may be stripped when parsed; see golang.org/issue/25192
2021/12/13 08:57:37 http: URL query contains semicolon, which is no longer a supported separator; parts of the query may be stripped when parsed; see golang.org/issue/25192
2021/12/13 09:11:37 http: URL query contains semicolon, which is no longer a supported separator; parts of the query may be stripped when parsed; see golang.org/issue/25192
2021/12/13 09:19:30 http: URL query contains semicolon, which is no longer a supported separator; parts of the query may be stripped when parsed; see golang.org/issue/25192

This is really not helpful, and since it's a web server, anyone can flood you with requests with a ;. Can this be switched off somehow?

antichris commented 2 years ago

@yktoo Have you looked into using AllowQuerySemicolons or writing your own middleware to replace all semicolons with URL escapes? Alternatively, you could join the convo at #49399 and/or #50034 or give either a thumbs-up if it is relevant to your case.

"AllowQuerySemicolons - http package - net/http - pkg.go.dev"

seankhliao commented 2 years ago

http.Server takes a *log.Logger, you can filter it out there if you want.

yktoo commented 2 years ago

Thanks @antichris, #49399/#50034 describe my concerns exactly. @seankhliao yes log filtering seems the only workaround at the moment, albeit very ugly.

shinny-chengzhi commented 2 years ago

I have been caught this by surprise and want to restore the old behavior. But it's unclear to me how to do this while supporting both 1.17&pre-1.17 since calling AllowQuerySemicolons will result in build error in 1.16.

ianlancetaylor commented 2 years ago

@shinny-chengzhi You can use a build tag to compile code differently for go1.16 and go1.17.

https://pkg.go.dev/cmd/go#hdr-Build_constraints

shinny-chengzhi commented 2 years ago

@ianlancetaylor Thanks, that will do it.