golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.11k stars 17.68k forks source link

net/url: empty query value is dropped on Encode() #20820

Closed carl-mastrangelo closed 7 years ago

carl-mastrangelo commented 7 years ago

Please answer these questions before submitting your issue. Thanks!

What version of Go are you using (go version)?

1.9

https://play.golang.org/p/FWtvu_9XEM

The Values.Encode function adds an unnecessary = to the end of kv pairs, even if the original value was empty. When using flag style url query values, this is kind of annoying because it is supposed to be either present or absent.

It would be nice to interpret a url.Values with nil value slice as a flag, and skip the = for encoding.

gopherbot commented 7 years ago

CL https://golang.org/cl/46834 mentions this issue.

carl-mastrangelo commented 7 years ago

To clarify, there are two related issues going on: ParseQuery and Encode don't round-trip the string. I don't think this is technically possible (we don't know if = should be kept), but it is possible to pick a better lossy translation.

Trying to work around this behavior by explicitly setting the value to nil:

v := make(url.Values)
v["prev"] = nil

doesn't produce any output.

bradfitz commented 7 years ago

We probably can't change the v["prev"] = nil behavior to start generating output. That would probably break enough people. A lot of code & tests have accumulated over 5 years depending on every quirk of our implementations.

It would be nice to interpret a url.Values with nil value slice as a flag, and skip the = for encoding.

Using nil or even a sentinel []string value would be unable to represent ?foo&foo&foo: https://play.golang.org/p/nO_OlSCiCE

You'd almost need a sentinel string value, but strings in Go are compared by value, which means we'd need to make up some magic value like:

package url

// EmptyValue is blah blah no equals sign.
const EmptyValue = "\x00\x00"

But then that precludes any user from actually using that value.

We did something like this for https://golang.org/pkg/net/http/#TrailerPrefix but TrailerPrefix worked because a valid value can't contain a colon, so we have a safe value to use.

With URL parameter values, all values are unacceptable.

carl-mastrangelo commented 7 years ago

Perhaps an empty sentinel array? It would be hacky no doubt. Something like:

var sentinel [0]string

func EmptyValue() []string {
  return sentinel[:]
}

Then, in Values.Encode, it could check if the address of slice[0] matches the address of sentinel. It wouldn't be able to catch the multiple 'foo&foo&foo' case, but since this PR treats it as a flag, just setting it once is the important case. Thoughts?

bradfitz commented 7 years ago

Again, that doesn't help with https://play.golang.org/p/nO_OlSCiCE

carl-mastrangelo commented 7 years ago

Depending on what level of Hacks are passible, I could imagine creating special strings with a prefix, and then slicing them to put in url.Values, When encoding, reflect and unsafe could be used to look for the sentinel prefix to know if its a special empty:

https://play.golang.org/p/hbj9XZEDEl

bradfitz commented 7 years ago

Sorry, we will not stoop to that level.

carl-mastrangelo commented 7 years ago

I don't think that's stooping. If there is a valid technical reason for disallowing it that would be an acceptable answer. The core of the problem is url.URL is publicly defined to be a map. Without defining a new type, or special parameters, the only remaining solution is to modify behavior. That is what the original CL does.

That said, supported foo&foo&foo is not a primary goal of the CL. Only that specific by more common problem needs a solution.

bradfitz commented 7 years ago

I don't think that's stooping.

You may not, but I do, and so would most of the Go team. We avoid unsafe except in very low-level places. The net/url package is not low-level and we don't want to make it be low-level by using unsafe.

That said, supported foo&foo&foo is not a primary goal of the CL. Only that specific by more common problem needs a solution.

It seems weird to support x=y&foo but not foo&foo. I'd prefer to be consistent, even if that meant the current foo=&foo= behavior.

carl-mastrangelo commented 7 years ago

Ok, expanding the scope of solutions. Please rank these in the most acceptable order:

  1. Declare that this issue is not a real problem, ignore it (The "ostrich" algorithm)
  2. Fork net/url, put it in x/net/url, which causes package proliferation
  3. Overload Encode to accept a func(p string) bool, that decides if it should be empty or not.
  4. Overload Encode to accept a func(ps []string) []bool, to allow deciding particular empties (like foo&foo=&foo
  5. Add a BetterValues, which is a [][2]*string that can accurately describe a query.
  6. Modify Encode to interpret nil or empty slices as an indicator that it should not be present
  7. Modify URL to have a ForceEmptyParam which changes how it encodes. There is precedent for adding such booleans to URL.
  8. Other?

3, 4, and 5 are API creep, 1 is not a real solution, because one of the other will take its place.

bradfitz commented 7 years ago

Commentary:

  1. This hasn't been a problem for 5 years. I'm a little surprised it suddenly is. Which applications require ?foo without an = sign?
  2. We've stopped expanding x/*. See #17244.
  3. We can't overload Encode. That's an API change.
  4. We can't overload Encode. That's an API change.
  5. Gross.
  6. Ideal (not changing API), if we figure out which way causes the least breakage in the community.
  7. Maybe. But force which? All? And that only helps (*URL).String, not https://golang.org/pkg/net/url/#Values.Encode, which is probably okay.
  8. Put it on github.com/carl-mastrangelo/carlurl
carl-mastrangelo commented 7 years ago
  1. Eliding the = makes for shorter urls. This is a good thing. I'm actually kind of surprised this isn't possible today or even tested for, given how many other good API decisions were made in net/.

You can see why I picked option 6 CL, because it was the best.

  1. is not feasible. By making my own, I force it to be on my API surface, just like Values is on URL, which is itself in http and elsewhere.

Looking at the Go 1 Compatibility promise, this appears to be undefined behavior that I am proposing to change. If some very small number of tests break, can we proceed with the original change?

bradfitz commented 7 years ago

We'll discuss at a @golang/proposal-review meeting.

Maybe @dsnet could test the change inside Google (a pretty large codebase) and run all those tests and see how much breaks.

dsnet commented 7 years ago

Assigning to myself to go test this.

rsc commented 7 years ago

Can you elaborate why this is an important problem to solve? Like Brad said, it's been five years and has not been a problem. Your original report says only "kind of annoying". We tend not to make major possibly-breaking changes with lots of downside if the only upside is to eliminate "kind of annoying".

carl-mastrangelo commented 7 years ago

Two answers to your question: First, it's a good thing, and second it's okay to make this change.

Why is it a good thing?

  1. Foremost is symmetry; parsing and encoding roundtrip doesn't preserve the original string. https://play.golang.org/p/SDU98LLL9v I realize that due to url.Value being publicly defined as a map, there is no way to express this in general (the a&a&a case), but this brings symmetry much closer, and for a likely use case.
  2. Shorter URLs are good. They are easier to read, easier to share, take up less space on the wire.
  3. There isn't a good way to work around it. url.Values doesn't provide me with a way of expressing what I want. It gets 95% of the way there. If you wanted to remove the superfluous '=' characters, how would you do it? A regex? String indexing a byte slicing? At that point you might as well have not used url.Values at all. The overhead of a minor tweak forces you to throw the baby out with the bathwater.

The reason I even raised this was because of issue 3; I couldn't easily work around this. Reimplementing URL encoding is a bad use of my time, error prone, a code smell, breaks compatibility with the standard library (like with http.PostForm), and even goes against core values of Go:

https://golang.org/doc/faq#x_in_std says:

provide key functionality that many Go programs require, such as formatted I/O and networking. It also contains elements important for web programming, including cryptography and support for standards like HTTP, JSON, and XML.

This is absolutely something important for web programming; it affects the visual representation of how my site links together.

Why is this an okay thing?

Why is this a safe change to make?

  1. I would like to point out there prior to my PR, there were no tests for this case. Perhaps the idea never came up, or the original authors hadn't anticipated the use case. My change would have passed without any tests being added at all.
  2. The behavior is not specified in documentation at all. As in nowhere. Not in Go, not in the URL RFC. This does not violate the Go compatibility promise.
  3. The reason no one has complained? Perhaps lack of usage. I only could a handful of uses in Go itself. Every usage I did see used this as k=v. I consider this an unanticipated use case.

We tend not to make major possibly-breaking changes with lots of downside if the only upside is to eliminate "kind of annoying".

So far that's FUD. We don't know who will break, if anybody. That is why I specifically asked:

If some very small number of tests break, can we proceed with the original change?

Which I believe @dsnet said he would try inside of Google after 1.9 is out the door. I would do this myself, but I am unfamiliar with the Go import process internally. Changing the files and testing globally affected very few tests, implying that the source files aren't actually used. That's neither here nor there.

rsc commented 7 years ago

I asked why the problem is important to solve. The only part of your response that could be viewed as addressing that question is the "shorter URLs are good" line. That sounds like a "nice to have" level importance. Let's work with that.

Round-tripping is not a justification. In general there are multiple possible URLs mapping to any particular representation. For example x=1&y=2 and y=2&x=1 map to the same representation, but no one complains that Encode cannot return the latter. Arguing that a particular form should round-trip is really just arguing that "this is the preferred printing form out of all the equivalent ones."

In this case, "x=" and "x" both map to the same representation, url.Values{"x": {""}}, and Encode prefers to print "x=". If the problem is that shorter URLs should be preferred, it sounds like you are arguing that Encode should instead choose "x"; that is, when value is empty in key=value, the = should be omitted as well. But I don't even see that on your list of choices. Why not?

The solution (6) introduces an unexpected special case that doesn't follow from the general one. It's established that url.Values{"x": {"1", "2"}} means x=1&x=2 and url.Values{"x": {"1"}} means x=1. In general, the URL has len(q["x"]) x parameters. The obvious extension is that url.Values{"x": {}} means no x parameters, not one without an equal sign. Even if no existing tests break, it's a special-case that will likely trip people up for years to come. That cost only makes sense to solve a very important problem. I don't see that level of importance here.

carl-mastrangelo commented 7 years ago

In this case, "x=" and "x" both map to the same representation, url.Values{"x": {""}}, and Encode prefers to print "x=". If the problem is that shorter URLs should be preferred, it sounds like you are arguing that Encode should instead choose "x"; that is, when value is empty in key=value, the = should be omitted as well. But I don't even see that on your list of choices. Why not?

It isn't as clear from this issue, because the CL was sent for review before this issue: https://go-review.googlesource.com/c/46834/1/src%252Fnet%252Furl%252Furl_test.go

As mentioned in the comment, I wanted to make the minimal possible change to enable the behavior.

The solution (6) introduces an unexpected special case that doesn't follow from the general one. It's only unexpected if constructing the url.Value from the map functions rather than the methods. Using Get, Set, Parse, Add, etc. can be made to keep existing behavior.

In the CL I sent out, I use len(vs) to check if the value is empty, but it doesn't have to. It could check vs == nil to mean that the = could be removed. This preserves the most original behavior, because ParseQuery always puts in a zero length slice ( https://play.golang.org/p/QyvlwYsVlK ). url.Values produced by the library would appear unchanged, while allowing my own code to set v[key] = nil, to override the production of = chars.

I do agree it is a special case, but since there isn't any wiggle room left in the API, it seems the most reasonable. That said, I disagree it is unexpected. It would be impossible(?) to encounter using solely the methods in net/url. The only surprising thing would be if someone today had code that looked like:

v, _ := url.ParseValues("q=1")
v["q"] = nil
v.Encode()

and expected the result to be empty.

Even if no existing tests break, it's a special-case that will likely trip people up for years to come.

If it doesn't trip up the entire corpus of Go code written in the past 5 years, then I find this statement to fall flat.

Finally:

Round-tripping is not a justification. In general there are multiple possible URLs mapping to any particular representation.

This isn't a black and white issue. It isn't like all round tripping is possible, or none of it is. What I am asking for in this issue is to add a class of url param strings that previously were not round trippable, which subsequently can be. If I wanted 100% fidelity of query strings, I would have sent Option 5 out for review; I didn't.

carl-mastrangelo commented 7 years ago

It doesn't appear the Go team feels this is an issue. I'm closing this.

nuarhu commented 6 years ago

Why does this simple issue result in a huge discussion with no result at all?

We link to a 3rd party URL that works only without the = behind the param key. Obviously, it is very annoying of that server to do that but that's just our loss.

Works: http://ad.zanox.com/ppc/?44455365C78013321

Doesn't work: http://ad.zanox.com/ppc/?44455365C78013321=

We add more parameters to the URL, so it's not as simple as to just slice off the last character. Back to string search+replace then. It's not important.

ianlancetaylor commented 6 years ago

@nuarhu It's not really useful to continue discussion on an issue closed so long ago. I recommend that you bring this to a forum; see https://golang.org/wiki/Questions. Thanks.