Closed flimzy closed 5 years ago
Yeah, I don't think that should be visible. I thought it was an internal error that caused the Transport to pick a new conn.
What's your Transport config & usage of the Transport look like?
Thanks for the quick response. I'll work on putting together a minimal reproduction case, and follow up shortly.
I am able to reproduce with this minimal Go code, communicating with a CouchDB 1.6.1 server, running in a Docker environment (I can provide minimal setup instructions if necessary). I've not yet been able to reproduce the same behavior with an httptest
server, which would of course make it so much easier to debug. Since it's not possible to completely reproduce this with just the Go code, I am including as much relevant information as I can think of below. Please let me know what additional info I can provide.
Note that the failure is semi-random. It's probably triggered by a race condition (whether the race is happening in the Go library, or on the server, I don't yet know).
With 2 or more simultaneous HTTP requests, the script inevitably panics with one of two errors, after successfully handling an arbitrary number of requests.
panic: Put http://localhost:9000/foo/7.txt: http: server closed idle connection
or
panic: Put http://localhost:9000/foo/4.txt: EOF
I'm not sure how related these are, except that I expect they're both related to the server dropping the connection unexpectedly (perhaps the difference is whether the drop occurs mid-request, or between requests).
As I increase concurrency, the frequency of 'http: server closed idle connection' increases relative to the EOF
errors. For example:
EOF
panics, and 2 closed idle connection
panics.EOF
panics versus 19 closed idle connection
panics.I have also done some network captures (sudo tcpdump -i lo -s 0 -w capture
) to capture the network traffic of both classes of failure.
This run is captured here:
$ go run main.go
PUT http://localhost:9000/foo/5.txt
PUT http://localhost:9000/foo/2.txt
PUT http://localhost:9000/foo/1.txt
PUT http://localhost:9000/foo/3.txt
PUT http://localhost:9000/foo/4.txt
PUT http://localhost:9000/foo/10.txt
PUT http://localhost:9000/foo/15.txt
PUT http://localhost:9000/foo/6.txt
PUT http://localhost:9000/foo/7.txt
panic: Put http://localhost:9000/foo/10.txt: EOF
goroutine 70 [running]:
main.doRequest(0xc420010ec0, 0xc40000000a)
/home/jonhall/go/src/github.com/flimzy/issue19943/main.go:33 +0x32f
created by main.main
/home/jonhall/go/src/github.com/flimzy/issue19943/main.go:21 +0x17b
exit status 2
And this run here:
$ go run main.go
PUT http://localhost:9000/foo/5.txt
PUT http://localhost:9000/foo/2.txt
PUT http://localhost:9000/foo/3.txt
PUT http://localhost:9000/foo/1.txt
PUT http://localhost:9000/foo/4.txt
PUT http://localhost:9000/foo/10.txt
panic: Put http://localhost:9000/foo/10.txt: http: server closed idle connection
goroutine 58 [running]:
main.doRequest(0xc420010ec0, 0xc40000000a)
/home/jonhall/go/src/github.com/flimzy/issue19943/main.go:33 +0x32f
created by main.main
/home/jonhall/go/src/github.com/flimzy/issue19943/main.go:21 +0x17b
exit status 2
A couple of random observations:
As indicated in a comment in the test code, if I don't consume the response body, the error doesn't occur. I don't know if this is related to actually consuming the body, or just timing.
If I don't send a request body, the error also doesn't occur. Again: I'm unsure how this relates.
The counter is only for convenience when matching the test requests with the HTTP packets in the dumps. The specific URL used, and the HTTP response status code received seem to be immaterial to the failure.
@flimzy I am having a similar issue however in a slightly different situation.
I am running a python uwsgi server in Docker and using DELETE
methods instead of PUT
with no concurrent connections.
I seem to only be able to replicate it while running in Docker with a golang client. Using curl, postman, ruby, python, ect.. to make the same request does not result in any EOF or TCP errors.
Doing a few packet traces and following the TCP stream for the DELETE
request shows that as soon as the request is made the TCP connection is ACK'd then FIN, ACK'd and closed without the server having time to respond.
I can only seem to replicate the issue when running the web server inside Docker. So it seems that whatever delay or modification to the TCP packets Docker is doing, the golang http client cannot handle correctly.
Is there any update on this issue? @bradfitz
Sorry, I'm just getting back to this after some leave & backlog.
Thanks for the pcaps.
I can reproduce this with the following Go server:
package main
import (
"io"
"log"
"math/rand"
"net/http"
)
func main() {
log.Fatal(http.ListenAndServe(":9000", http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
// Occasionally say that we're keeping the connection open but lie and close it:
if rand.Intn(50) == 0 {
conn, _, err := w.(http.Hijacker).Hijack()
if err != nil {
return
}
io.WriteString(conn, "HTTP/1.1 404 Object Not Found\r\nServer: CouchDB/1.6.1 (Erlang OTP/17)\r\nDate: Wed, 12 Apr 2017 18:39:49 GMT\r\nContent-Type: text/plain; charset=utf-8\r\nContent-Length: 44\r\nCache-Control: must-revalidate\r\n\r\n{\"error\":\"not_found\",\"reason\":\"no_db_file\"}\n")
conn.Close()
return
}
w.WriteHeader(404)
io.WriteString(w, "{\"error\":\"not_found\",\"reason\":\"no_db_file\"}\n")
})))
}
And then I get what you were seeing with your main.go
above:
....
PUT http://localhost:9000/foo/134.txt
PUT http://localhost:9000/foo/140.txt
PUT http://localhost:9000/foo/137.txt
PUT http://localhost:9000/foo/138.txt
PUT http://localhost:9000/foo/139.txt
PUT http://localhost:9000/foo/136.txt
panic: Put http://localhost:9000/foo/139.txt: http: server closed idle connection
goroutine 523 [running]:
main.doRequest(0xc4200184a0, 0xc40000008b)
/home/bradfitz/issue19943-client.go:33 +0x2f4
created by main.main
/home/bradfitz/issue19943-client.go:20 +0x13f
exit status 2
I'll investigate.
So here's what's happening:
The HTTP Transport would normally retry those requests, but because they're POST requests they're not idempotent and Go conservatively assumes it might be unsafe to retry them. In fact, Go would've retried them had the server's EOF happened earlier before any bytes were written, but Go manages to write a few bytes to the server first, so Go has to assume the server acted on them.
We currently have no way to mark an *http.Request as safe to retry when it's not a GET/HEAD/OPTIONS/TRACE. I suppose we can repurpose that bug for this.
The other root cause of this bug is that CouchDB is lying and saying it'll keep the connection open but then closing it immediately, which is making Go pick a connection which is just going to hang up. Again, that's okay for GET requests (Go will retry), but not for POSTs. It's also okay when there's not a lot of traffic happening quickly, because Go will see the hangup and remove that connection from the idle pool.
Perhaps Go could also keep stats per-hostname of which servers are lying about connection keep-alives.
In any case, the error message returned here is valid and intentional. I'll send a change to update the comments on that internal error message, and we'll keep this bug open to do more in future Go releases.
Change https://golang.org/cl/86375 mentions this issue: net/http: document internal error errServerClosedIdle more
There seems to be a growing support for idempotency-key
headers to indicate that a POST should be idempotent. Stripe, GoCardless, Worldpay and Paypal support idempotency-key
and Openbanking supports x-idempotency-key
. As far as I can tell there is no current effort to specify via IETF but there does seem to be convergence among financial APIs.
Go could look for this header and infer that the request is intended to be idempotent.
An rclone user has been reporting this on POST-ing to the dropbox API.
I've put a work-around in for the moment.
Since errServerClosedIdle is no longer private could it be exposed? It would mean I could kill this, most hideous piece of code in the world...
// retriableErrorStrings is a list of phrases which when we find it
// in an an error, we know it is a networking error which should be
// retried.
//
// This is incredibly ugly - if only errors.Cause worked for all
// errors and all errors were exported from the stdlib.
var retriableErrorStrings = []string{
"use of closed network connection", // internal/poll/fd.go
"unexpected EOF reading trailer", // net/http/transfer.go
"transport connection broken", // net/http/transport.go
"http: ContentLength=", // net/http/transfer.go
"server closed idle connection", // net/http/transport.go
}
@iand, nice! Thank you for all those links. That's the first I've heard of that and it's great to see. I've always wanted to see a header like that so I'm pleased that it both exists and is pretty widely used.
I say we make Go respect both idempotency-key
and x-idempotency-key
. If Go sees either, it can mark the request as retry-able.
Change https://golang.org/cl/147457 mentions this issue: net/http: make Transport respect {X-,}Idempotency-Key header
Change https://golang.org/cl/191779 mentions this issue: net/http: document additional uses of Requesty.GetBody()
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?go version go1.8 linux/amd64
What operating system and processor architecture are you using (
go env
)?GOARCH="amd64" GOBIN="" GOEXE="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOOS="linux" GOPATH="/home/jonhall/go" GORACE="" GOROOT="/usr/local/go" GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64" GCCGO="gccgo" CC="gcc" GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build059328702=/tmp/go-build -gno-record-gcc-switches" CXX="g++" CGO_ENABLED="1" PKG_CONFIG="pkg-config" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2"
What did you do?
Tests in one of my applications are frequently receiving the error
http: server closed idle connection
. Code comments make me think this error should never be visible. (I fully suspect a buggy server, which is dropping connections, but this issue isn't about that behavior, per se.)This issue is simply intended to clarify the intended behavior of the
net/http
package with respect to this error message.Is the fact that I see this error message a bug which should be reported? (If so, I'll produce a MRE and file a followup issue). Or is the source comment (line 605 here) just misleading?