golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.09k stars 17.68k forks source link

net/url: JoinPath generates bad path if initial url path is empty string #58605

Open perj opened 1 year ago

perj commented 1 year ago

What version of Go are you using (go version)?

$ go version
go version go1.20.1 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOOS="linux"

What did you do?

https://go.dev/play/p/wXMvYKC1FSY

What did you expect to see?

200 OK

What did you see instead?

400 Bad Request

The request sent to the server starts with GET api/endpoint. It should be GET /api/endpoint. The former generates a 400 error in url.ParseRequestURI.

ianlancetaylor commented 1 year ago

CC @carlmjohnson

The issue here is that before the call to req.URL.JoinPath the req.Path field the empty string. After the call the req.Path field is api/endpoint. Should the empty string imply the root?

earthboundkid commented 1 year ago

Yes, I think so. It's a sort of unfortunate fact about url.URL.Path that "" and "/" are different but in most cases should be treated as equivalent.

earthboundkid commented 1 year ago

Hmm, a quick and dirty attempt at fixing it causes these test errors:

   url_test.go:2199: JoinPath("a", []) = "/a", <nil>, want "a", nil
    url_test.go:2211: Parse("a").JoinPath([]) = "/a", <nil>, want "a", nil
    url_test.go:2199: JoinPath("a", ["b"]) = "/a/b", <nil>, want "a/b", nil
    url_test.go:2211: Parse("a").JoinPath(["b"]) = "/a/b", <nil>, want "a/b", nil
    url_test.go:2199: JoinPath("a", ["../b"]) = "/b", <nil>, want "b", nil
    url_test.go:2211: Parse("a").JoinPath(["../b"]) = "/b", <nil>, want "b", nil
    url_test.go:2199: JoinPath("a", ["../../b"]) = "/b", <nil>, want "b", nil
    url_test.go:2211: Parse("a").JoinPath(["../../b"]) = "/b", <nil>, want "b", nil
    url_test.go:2199: JoinPath("", ["a"]) = "/a", <nil>, want "a", nil
    url_test.go:2211: Parse("").JoinPath(["a"]) = "/a", <nil>, want "a", nil
    url_test.go:2199: JoinPath("", ["../a"]) = "/a", <nil>, want "a", nil
    url_test.go:2211: Parse("").JoinPath(["../a"]) = "/a", <nil>, want "a", nil

I'm not sure what I think the output should be here.

perj commented 1 year ago

I'm sort of thinking empty path might only be the same as / if the hostname is not empty. If it's empty then it's a relative url, right?

I'd probably just special case this...

if elem[0] == "" && u.Host != "" {
    elem[0] = "/"
}
earthboundkid commented 1 year ago
diff --git a/src/net/url/url.go b/src/net/url/url.go
index d530a50d40..bdc85ef6fa 100644
--- a/src/net/url/url.go
+++ b/src/net/url/url.go
@@ -1209,6 +1209,9 @@ func (u *URL) JoinPath(elem ...string) *URL {
    if strings.HasSuffix(elem[len(elem)-1], "/") && !strings.HasSuffix(p, "/") {
        p += "/"
    }
+   if p != "" && p[0] != '/' && u.Host != "" {
+       p = "/" + p
+   }
    url := *u
    url.setPath(p)
    return &url
diff --git a/src/net/url/url_test.go b/src/net/url/url_test.go
index 577cf631c8..df7f9dea9c 100644
--- a/src/net/url/url_test.go
+++ b/src/net/url/url_test.go
@@ -2202,6 +2202,9 @@ func TestJoinPath(t *testing.T) {
        u, err := Parse(tt.base)
        if err == nil {
            u = u.JoinPath(tt.elem...)
+           if u.Path != "" && u.Path[0] != '/' && u.Host != "" {
+               t.Errorf("Parse(%q).JoinPath(%q).Path = %q", tt.base, tt.elem, u.Path)
+           }
            out = u.String()
        }
        if out != tt.out || (err == nil) != (tt.out != "") {

This changes it to add a slash if the hostname is not empty, and it passes tests, but like I said, I'm not sure what the correct behavior is here.

thanm commented 1 year ago

@neild @rsc per owners

neild commented 1 year ago

The doc for Path states:

Path  string // path (relative paths may omit leading slash)

This implies absolute paths must not omit the leading slash, but URL.RequestURI contains a special case to return "/" when Path is empty. So perhaps the rule is that absolute paths must not omit the leading slash, but an empty path is equivalent to "/".

I think the most consistent behavior here would be for JoinPath to consider an empty Path to be equivalent to "/", same as RequestURI.

earthboundkid commented 1 year ago

How about this then:

diff --git a/src/net/url/url.go b/src/net/url/url.go
index d530a50d40..c7415897e2 100644
--- a/src/net/url/url.go
+++ b/src/net/url/url.go
@@ -1200,7 +1200,10 @@ func (u *URL) JoinPath(elem ...string) *URL {
        // Return a relative path if u is relative,
        // but ensure that it contains no ../ elements.
        elem[0] = "/" + elem[0]
-       p = path.Join(elem...)[1:]
+       p = path.Join(elem...)
+       if elem[0] != "/" || u.Host == "" {
+           p = p[1:]
+       }
    } else {
        p = path.Join(elem...)
    }
gopherbot commented 1 year ago

Change https://go.dev/cl/469935 mentions this issue: net/url: consider an empty base Path as equivalent to / in JoinPath

neild commented 1 year ago

That's still handling the case where Host is unset differently. I think consistency with RequestURI would be to just convert Path to "/" when it is empty.

https://go.dev/cl/469935 takes that approach. What do you think?

perj commented 1 year ago

Not sure if I'm supposed to be pushing this in any kind of manner. If so, consider yourselves pushed. :)

robryk commented 1 year ago

Note that currently the value of Path in result of url.Parse depends on whether the original URL ended in / or not: https://go.dev/play/p/H6H902lJTam.

I'd argue that this might be considered to also be a bug in url.Parse: for two URLs that (I believe) should be equivalent it generates meaningfully different URL structs.

haton14 commented 1 year ago

https://go.dev/play/p/l7jfSgP24id I also receive a 400 response in func (c *Client) Do(req *Request)due to the malformed struct generated by JoinPath.

jpcope commented 1 year ago

Glad I caught this in a unit test before updating my client lib from http.NewRequest. Easy enough to workaround though - set Path to "/" on the base url.URL type if empty before calling JoinPath - if you know the right-hand-side arg is non-empty.

rsc commented 8 months ago

This change broke a variety of tests inside Google, indicating that it would probably break a variety of tests and potentially real-world uses outside Google as well. I rolled back the change. If we roll it forward again we should probably put it behind a GODEBUG, like urljoinpathslash=0 to go back to the old behavior.

neild commented 8 months ago

I think this is a case where if we would want to add a GODEBUG, that's sufficient evidence that we just shouldn't make the change.

earthboundkid commented 8 months ago

What were the broken tests at Google testing?

neild commented 8 months ago

Two cases of non-test code, and one case of test code doing variations on:

u, err := url.JoinPath("https://", "hostname", "some", "path", "components")

I think this is clearly a misuse of JoinPath; internally, it's producing a URL with no Host and a path of "hostname/some/path/components", which happens to stringify as if "hostname" was a host. However, it "works" today and this change would have broken real code, not just tests.


One case of non-test code doing this and relying on the result:

u, err := url.JoinPath("", "foo")

I don't understand why you'd do this, but it's not wrong.


One case of non-test code doing this and a test relying on the result when part1 is not absolute:

u, err := url.Parse("https://hostname")
if err != nil {
  return nil, err
}
u = u.JoinPath("part1", "part2")

I think that this might be a case where the production code will always use an absolute part1, but I'm not sure.