golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.42k stars 17.71k forks source link

net/http: Transport hangs when io.Pipe is used as request body #29246

Closed tehmoon closed 1 year ago

tehmoon commented 5 years ago

What version of Go are you using (go version)?

$ go version
go version go1.11.2 darwin/amd64

Does this issue reproduce with the latest release?

I just tried with 1.11.3 latest brew and still happening

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/moon/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/moon/go"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/Cellar/go/1.11.2/libexec"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.11.2/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/pm/vc42c0nn7cl522r_gsg0njlc0000gn/T/go-build143101879=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

Compiled and ran the following program: https://play.golang.org/p/zWXnswPCqmm This is an HTTP client, which can be tested against a server:

nc -nlvp 8081

What did you expect to see?

If I run the program then wait 1-2 seconds, and start typing lines in my terminal, I'm expecting those same line to be sent to the server in http transport format.

What did you see instead?

What I saw was that if I do wait couple seconds before typing, the http client hangs and nothing is sent to the server. I did some tracing with tcpdump to confirm that no packet were sent to the server. On the other end, if I do start typing before the web client sends the http header -- meaning that you usually have the compiling time with go run to fill out stdin with some crap -- the web client does not hang and sends everything correctly.

My own workaround is to make sure data is written to the *io.Writer part before invoking http.Client.Do(req) but I have no idea why this is happening. Perhaps it's a bug, or perhaps it's a feature.

Digging through the stack trace, I stumbled upon a comment:

// probeRequestBody reads a byte from t.Body to see whether it's empty
// (returns io.EOF right away).
//
// But because we've had problems with this blocking users in the past
// (issue 17480) when the body is a pipe (perhaps waiting on the response
// headers before the pipe is fed data), we need to be careful and bound how
// long we wait for it. This delay will only affect users if all the following
// are true:
//   * the request body blocks
//   * the content length is not set (or set to -1)
//   * the method doesn't usually have a body (GET, HEAD, DELETE, ...)
//   * there is no transfer-encoding=chunked already set.
// In other words, this delay will not normally affect anybody, and there
// are workarounds if it does.
func (t *transferWriter) probeRequestBody() {

Which brought me to the issue and @bradfitz talking about io.Pipe. However I did test with req.ContentLength = 1 and it did not do the work.

It seems to hit this part of the code:

    timer := time.NewTimer(200 * time.Millisecond)
    select {
    case rres := <-t.ByteReadCh:
        timer.Stop()
        if rres.n == 0 && rres.err == io.EOF {
            // It was empty.
            t.Body = nil
            t.ContentLength = 0
        } else if rres.n == 1 {
            if rres.err != nil {
                t.Body = io.MultiReader(&byteReader{b: rres.b}, errorReader{rres.err})
            } else {
                t.Body = io.MultiReader(&byteReader{b: rres.b}, t.Body)
            }
        } else if rres.err != nil {
            t.Body = errorReader{rres.err}
        }
    case <-timer.C:
        // Too slow. Don't wait. Read it later, and keep
        // assuming that this is ContentLength == -1
        // (unknown), which means we'll send a
        // "Transfer-Encoding: chunked" header.
        t.Body = io.MultiReader(finishAsyncByteRead{t}, t.Body)
        // Request that Request.Write flush the headers to the
        // network before writing the body, since our body may not
        // become readable until it's seen the response headers.
        t.FlushHeaders = true
    }

The case <- timer.C does not make sense to me, I don't know the library really deep and I am trying to understand, but my sense is that it feels like a bug.

Perhaps it is not, but I would be really grateful if somebody explains why this is happening and more importantly, if it is the expected behavior or not.

PC=0x7fff5fe401b2 m=0 sigcode=0

goroutine 0 [idle]:
runtime.pthread_cond_wait(0x145a1a0, 0x145a160, 0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/runtime/sys_darwin.go:302 +0x51
runtime.semasleep(0xffffffffffffffff, 0x10493d1)
        /usr/local/Cellar/go/1.11.2/libexec/src/runtime/os_darwin.go:63 +0x85
runtime.notesleep(0x1459f60)
        /usr/local/Cellar/go/1.11.2/libexec/src/runtime/lock_sema.go:167 +0xe3
runtime.stopm()
        /usr/local/Cellar/go/1.11.2/libexec/src/runtime/proc.go:2016 +0xe3
runtime.findrunnable(0xc000024500, 0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/runtime/proc.go:2487 +0x4dc
runtime.schedule()
        /usr/local/Cellar/go/1.11.2/libexec/src/runtime/proc.go:2613 +0x13a
runtime.park_m(0xc000001500)
        /usr/local/Cellar/go/1.11.2/libexec/src/runtime/proc.go:2676 +0xae
runtime.mcall(0x10556ab)
        /usr/local/Cellar/go/1.11.2/libexec/src/runtime/asm_amd64.s:299 +0x5b

goroutine 1 [semacquire]:
sync.runtime_Semacquire(0xc0000164b4)
        /usr/local/Cellar/go/1.11.2/libexec/src/runtime/sema.go:56 +0x39
sync.(*WaitGroup).Wait(0xc0000164b4)
        /usr/local/Cellar/go/1.11.2/libexec/src/sync/waitgroup.go:130 +0x64
main.main()
        /tmp/blah/client/main.go:46 +0x237

goroutine 4 [select]:
io.(*pipe).Write(0xc0000a6140, 0xc000100001, 0x4, 0xfff, 0x1, 0x0, 0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/io/pipe.go:87 +0x1cc
io.(*PipeWriter).Write(0xc00000e038, 0xc000100000, 0x5, 0x1000, 0x0, 0x0, 0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/io/pipe.go:153 +0x4c
main.main.func1(0xc00000e038, 0xc0000164b4)
        /tmp/blah/client/main.go:30 +0xb5
created by main.main
        /tmp/blah/client/main.go:22 +0x1f7

goroutine 5 [select]:
net/http.(*persistConn).roundTrip(0xc000110000, 0xc000094e40, 0x0, 0x0, 0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/net/http/transport.go:2101 +0x56a
net/http.(*Transport).roundTrip(0x1454620, 0xc0000fe000, 0x203000, 0xc000055d48, 0x11d511a)
        /usr/local/Cellar/go/1.11.2/libexec/src/net/http/transport.go:465 +0x9b1
net/http.(*Transport).RoundTrip(0x1454620, 0xc0000fe000, 0x1454620, 0x0, 0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/net/http/roundtrip.go:17 +0x35
net/http.send(0xc0000fe000, 0x12bef80, 0x1454620, 0x0, 0x0, 0x0, 0xc00000e058, 0x0, 0xc000055dd8, 0x1)
        /usr/local/Cellar/go/1.11.2/libexec/src/net/http/client.go:250 +0x14b
net/http.(*Client).send(0xc000094cf0, 0xc0000fe000, 0x0, 0x0, 0x0, 0xc00000e058, 0x0, 0x1, 0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/net/http/client.go:174 +0xfa
net/http.(*Client).do(0xc000094cf0, 0xc0000fe000, 0x0, 0x0, 0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/net/http/client.go:641 +0x2a8
net/http.(*Client).Do(0xc000094cf0, 0xc0000fe000, 0x0, 0x0, 0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/net/http/client.go:509 +0x35
main.main.func2(0xc0000fe000, 0xc0000164b4)
        /tmp/blah/client/main.go:39 +0x45
created by main.main
        /tmp/blah/client/main.go:37 +0x226

goroutine 37 [IO wait]:
internal/poll.runtime_pollWait(0x2400ea0, 0x72, 0xc000055a88)
        /usr/local/Cellar/go/1.11.2/libexec/src/runtime/netpoll.go:173 +0x66
internal/poll.(*pollDesc).wait(0xc00013a018, 0x72, 0xffffffffffffff00, 0x12bfac0, 0x14243e8)
        /usr/local/Cellar/go/1.11.2/libexec/src/internal/poll/fd_poll_runtime.go:85 +0x9a
internal/poll.(*pollDesc).waitRead(0xc00013a018, 0xc000154000, 0x1000, 0x1000)
        /usr/local/Cellar/go/1.11.2/libexec/src/internal/poll/fd_poll_runtime.go:90 +0x3d
internal/poll.(*FD).Read(0xc00013a000, 0xc000154000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/internal/poll/fd_unix.go:169 +0x1d6
net.(*netFD).Read(0xc00013a000, 0xc000154000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc000114018, 0xc000154000, 0x1000, 0x1000, 0x0, 0x0, 0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/net/net.go:177 +0x68
net/http.(*persistConn).Read(0xc000110000, 0xc000154000, 0x1000, 0x1000, 0x1006493, 0xc000131470, 0x102a352)
        /usr/local/Cellar/go/1.11.2/libexec/src/net/http/transport.go:1497 +0x75
bufio.(*Reader).fill(0xc000152000)
        /usr/local/Cellar/go/1.11.2/libexec/src/bufio/bufio.go:100 +0x106
bufio.(*Reader).Peek(0xc000152000, 0x1, 0xc0000ae2a0, 0xc000131558, 0x0, 0x0, 0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/bufio/bufio.go:132 +0x3f
net/http.(*persistConn).readLoop(0xc000110000)
        /usr/local/Cellar/go/1.11.2/libexec/src/net/http/transport.go:1645 +0x1a2
created by net/http.(*Transport).dialConn
        /usr/local/Cellar/go/1.11.2/libexec/src/net/http/transport.go:1338 +0x941

goroutine 38 [chan receive]:
net/http.finishAsyncByteRead.Read(0xc000162000, 0xc000170000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/net/http/transfer.go:1030 +0x60
io.(*multiReader).Read(0xc000164020, 0xc000170000, 0x8000, 0x8000, 0x3, 0x0, 0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/io/multi.go:26 +0xaf
net/http.transferBodyReader.Read(0xc000162000, 0xc000170000, 0x8000, 0x8000, 0x1, 0x0, 0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/net/http/transfer.go:62 +0x56
io.copyBuffer(0x2444040, 0xc000166020, 0x12bf9e0, 0xc000162000, 0xc000170000, 0x8000, 0x8000, 0x123aae0, 0x1456a00, 0x2444040)
        /usr/local/Cellar/go/1.11.2/libexec/src/io/io.go:402 +0x125
io.Copy(0x2444040, 0xc000166020, 0x12bf9e0, 0xc000162000, 0xc000166020, 0x0, 0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/io/io.go:364 +0x5a
net/http.(*transferWriter).writeBody(0xc000162000, 0x12bed40, 0xc000134040, 0x2, 0x2)
        /usr/local/Cellar/go/1.11.2/libexec/src/net/http/transfer.go:355 +0x155
net/http.(*Request).write(0xc0000fe000, 0x12bed40, 0xc000134040, 0x0, 0xc000138030, 0x0, 0x0, 0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/net/http/request.go:645 +0x6e8
net/http.(*persistConn).writeLoop(0xc000110000)
        /usr/local/Cellar/go/1.11.2/libexec/src/net/http/transport.go:1888 +0x1b8
created by net/http.(*Transport).dialConn
        /usr/local/Cellar/go/1.11.2/libexec/src/net/http/transport.go:1339 +0x966

rax    0x104
rbx    0x2
rcx    0x7ffeefbff5e8
rdx    0x800
rdi    0x145a1a0
rsi    0x80100000900
rbp    0x7ffeefbff670
rsp    0x7ffeefbff5e8
r8     0x0
r9     0xa0
r10    0x0
r11    0x202
r12    0x145a1a0
r13    0x16
r14    0x80100000900
r15    0xd6435c0
rip    0x7fff5fe401b2
rflags 0x203
cs     0x7
fs     0x0
gs     0x0
exit status 2

Much appreciate any response and sorry for the time wasted if that had already be solved. I did spend some amount of time doing prior research and PoC before posting this.

odeke-em commented 5 years ago

Thank you for reporting this issue @tehmoon and for the analysis and welcome to the Go project!

Following your findings

To follow along with your findings here is a repro that if given a SIGQUIT after say 3 seconds produces a stacktrace very similar to yours. However, even if I wait until the headers have been sent by the client

package main

import (
    "context"
    "io"
    "io/ioutil"
    "log"
    "net/http"
    "net/http/httptest"
    "net/http/httptrace"
    "sync"
    "time"
)

func main() {
    log.SetFlags(0)

    cst := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        blob, _ := ioutil.ReadAll(io.LimitReader(r.Body, 100))
        log.Printf("Server: %s", blob)
        w.Write([]byte("Hello, world!"))
        w.(http.Flusher).Flush()
        r.Body.Close()
    }))
    defer cst.Close()

    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()

    prc, pwc := io.Pipe()

    var triggerSendHeadersOnce sync.Once
    sendHeaders := make(chan bool)

    go func() {
        defer pwc.Close()

        <-sendHeaders

        for i := 0; i < 1000; i++ {
            select {
            case <-time.After(450 * time.Millisecond):
                pwc.Write([]byte("Hello from client"))
            case <-ctx.Done():
                return
            }
        }
    }()

    trace := &httptrace.ClientTrace{
        WroteHeaders: func() {
            triggerSendHeadersOnce.Do(func() {
                log.Println("Closing sendHeaders")
                close(sendHeaders)
            })
        },
        WroteHeaderField: func(key string, value []string) {
            log.Printf("Header %q: %#v\n", key, value)
        },
        WroteRequest: func(ri httptrace.WroteRequestInfo) {
            log.Printf("WroteRequestInfo: %#v", ri)
        },
    }

    req, _ := http.NewRequest("POST", cst.URL, prc)
    req = req.WithContext(httptrace.WithClientTrace(req.Context(), trace))
    res, err := http.DefaultClient.Do(req)
    if err != nil {
        log.Fatalf("Failed to make request: %v", err)
    }
    blob, _ := ioutil.ReadAll(res.Body)
    _ = res.Body.Close()
    log.Printf("Client: %s", blob)
}

once you see it having sent the headers and some bit of the body

Header "Host": []string{"127.0.0.1:64605"}
Header "User-Agent": []string{"Go-http-client/1.1"}
Header "Transfer-Encoding": []string{"chunked"}
Header "Accept-Encoding": []string{"gzip"}
Closing sendHeaders
Server: Hello from clientHello from clientHello from clientHello from clientHello from clientHello from clie

stacktrace

```shell $ go run main.go Header "Host": []string{"127.0.0.1:64605"} Header "User-Agent": []string{"Go-http-client/1.1"} Header "Transfer-Encoding": []string{"chunked"} Header "Accept-Encoding": []string{"gzip"} Closing sendHeaders Server: Hello from clientHello from clientHello from clientHello from clientHello from clientHello from clie ^\SIGQUIT: quit PC=0x7fff61f6c86a m=0 sigcode=0 goroutine 0 [idle]: runtime.pthread_cond_wait(0x1668328, 0x16682e8, 0x0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/sys_darwin.go:378 +0x39 runtime.semasleep(0xffffffffffffffff, 0xcd64fb7e) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/os_darwin.go:63 +0x85 runtime.notesleep(0x16680e8) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/lock_sema.go:173 +0xe0 runtime.stopm() /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/proc.go:1928 +0xc0 runtime.findrunnable(0xc000032500, 0x0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/proc.go:2391 +0x53f runtime.schedule() /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/proc.go:2524 +0x2be runtime.park_m(0xc000096600) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/proc.go:2610 +0x9d runtime.mcall(0x10590d6) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/asm_amd64.s:318 +0x5b goroutine 1 [select]: net/http.(*persistConn).roundTrip(0xc000124360, 0xc00010a1b0, 0x0, 0x0, 0x0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/transport.go:2399 +0x770 net/http.(*Transport).roundTrip(0x1661fa0, 0xc000110100, 0x8, 0xc00005e9f0, 0x100e2e8) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/transport.go:535 +0x8f8 net/http.(*Transport).RoundTrip(0x1661fa0, 0xc000110100, 0x1661fa0, 0x0, 0x0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/roundtrip.go:17 +0x35 net/http.send(0xc000110100, 0x141e4c0, 0x1661fa0, 0x0, 0x0, 0x0, 0xc000010038, 0x203000, 0x1, 0x0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/client.go:250 +0x43a net/http.(*Client).send(0x1667280, 0xc000110100, 0x0, 0x0, 0x0, 0xc000010038, 0x0, 0x1, 0x139bba0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/client.go:174 +0xfa net/http.(*Client).do(0x1667280, 0xc000110100, 0x0, 0x0, 0x0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/client.go:641 +0x3ce net/http.(*Client).Do(...) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/client.go:509 main.main() /Users/emmanuelodeke/Desktop/openSrc/bugs/golang/29246/main.go:67 +0x4b1 goroutine 3 [IO wait]: internal/poll.runtime_pollWait(0x1e11ea8, 0x72, 0x0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/netpoll.go:184 +0x55 internal/poll.(*pollDesc).wait(0xc000108018, 0x72, 0x0, 0x0, 0x13a740f) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(*pollDesc).waitRead(...) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(*FD).Accept(0xc000108000, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/internal/poll/fd_unix.go:384 +0x1f8 net.(*netFD).accept(0xc000108000, 0xc0000ab748, 0x33d6dacf1300, 0x8) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/fd_unix.go:238 +0x42 net.(*TCPListener).accept(0xc00000e040, 0xc0000c2104, 0xc0000c20c0, 0x0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/tcpsock_posix.go:139 +0x32 net.(*TCPListener).Accept(0xc00000e040, 0xc00005fe08, 0x18, 0xc000001200, 0x12b83ce) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/tcpsock.go:261 +0x47 net/http.(*Server).Serve(0xc00010c000, 0x1423f20, 0xc00000e040, 0x0, 0x0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/server.go:2896 +0x286 net/http/httptest.(*Server).goServe.func1(0xc0000c20c0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/httptest/server.go:298 +0x7b created by net/http/httptest.(*Server).goServe /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/httptest/server.go:296 +0x5c goroutine 4 [select]: main.main.func2(0xc000010028, 0xc000022660, 0x1424b20, 0xc00002e040) /Users/emmanuelodeke/Desktop/openSrc/bugs/golang/29246/main.go:41 +0x1aa created by main.main /Users/emmanuelodeke/Desktop/openSrc/bugs/golang/29246/main.go:35 +0x2bc goroutine 8 [IO wait]: internal/poll.runtime_pollWait(0x1e11dd8, 0x72, 0xffffffffffffffff) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/netpoll.go:184 +0x55 internal/poll.(*pollDesc).wait(0xc000108398, 0x72, 0x1000, 0x1000, 0xffffffffffffffff) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(*pollDesc).waitRead(...) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(*FD).Read(0xc000108380, 0xc000152000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/internal/poll/fd_unix.go:169 +0x22b net.(*netFD).Read(0xc000108380, 0xc000152000, 0x1000, 0x1000, 0x1761b88, 0x0, 0x0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/fd_unix.go:202 +0x4f net.(*conn).Read(0xc000010050, 0xc000152000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/net.go:177 +0x68 net/http.(*persistConn).Read(0xc000124360, 0xc000152000, 0x1000, 0x1000, 0x100692d, 0x60, 0x0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/transport.go:1716 +0x75 bufio.(*Reader).fill(0xc0000c2540) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/bufio/bufio.go:100 +0x103 bufio.(*Reader).Peek(0xc0000c2540, 0x1, 0xc00015a000, 0x0, 0x203000, 0x203000, 0x203000) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/bufio/bufio.go:138 +0x4f net/http.(*persistConn).readLoop(0xc000124360) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/transport.go:1869 +0x1d6 created by net/http.(*Transport).dialConn /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/transport.go:1538 +0xafe goroutine 35 [IO wait]: internal/poll.runtime_pollWait(0x1e11d08, 0x72, 0xffffffffffffffff) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/runtime/netpoll.go:184 +0x55 internal/poll.(*pollDesc).wait(0xc00013a018, 0x72, 0x1000, 0x1000, 0xffffffffffffffff) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/internal/poll/fd_poll_runtime.go:87 +0x45 internal/poll.(*pollDesc).waitRead(...) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/internal/poll/fd_poll_runtime.go:92 internal/poll.(*FD).Read(0xc00013a000, 0xc000162000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/internal/poll/fd_unix.go:169 +0x22b net.(*netFD).Read(0xc00013a000, 0xc000162000, 0x1000, 0x1000, 0x17, 0xc000195360, 0x106765c) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/fd_unix.go:202 +0x4f net.(*conn).Read(0xc0000be010, 0xc000162000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/net.go:177 +0x68 net/http.(*connReader).Read(0xc000158030, 0xc000162000, 0x1000, 0x1000, 0x17, 0x0, 0x0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/server.go:785 +0xf4 bufio.(*Reader).fill(0xc00011a060) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/bufio/bufio.go:100 +0x103 bufio.(*Reader).ReadSlice(0xc00011a060, 0xa, 0x1000, 0xc000162000, 0x2, 0x2, 0x0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/bufio/bufio.go:359 +0x3d net/http/internal.readChunkLine(0xc00011a060, 0x2, 0x0, 0x0, 0x0, 0x2) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/internal/chunked.go:122 +0x34 net/http/internal.(*chunkedReader).beginChunk(0xc00010a3c0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/internal/chunked.go:48 +0x32 net/http/internal.(*chunkedReader).Read(0xc00010a3c0, 0xc00018e000, 0x2000, 0x2000, 0xc0001955f0, 0x0, 0x0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/internal/chunked.go:93 +0x132 net/http.(*body).readLocked(0xc00002e140, 0xc00018e000, 0x2000, 0x2000, 0x11, 0x0, 0x0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/transfer.go:847 +0x5f net/http.(*body).Read(0xc00002e140, 0xc00018e000, 0x2000, 0x2000, 0x0, 0x0, 0x0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/transfer.go:839 +0x102 io.(*LimitedReader).Read(0xc000168040, 0xc00018e000, 0x2000, 0x2000, 0x11, 0x0, 0x0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/io/io.go:448 +0x63 io/ioutil.devNull.ReadFrom(0x0, 0x141e320, 0xc000168040, 0x134d480, 0x1, 0x1e51048) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/io/ioutil/ioutil.go:147 +0x92 io.copyBuffer(0x141ee60, 0x1683c68, 0x141e320, 0xc000168040, 0x0, 0x0, 0x0, 0x1379640, 0x13c0a00, 0x1e16000) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/io/io.go:388 +0x2ed io.Copy(...) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/io/io.go:364 io.CopyN(0x141ee60, 0x1683c68, 0x1e16000, 0xc00002e140, 0x40001, 0x0, 0x0, 0xc000121d70) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/io/io.go:340 +0x9a net/http.(*chunkWriter).writeHeader(0xc00010c118, 0xc000176000, 0xd, 0x800) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/server.go:1350 +0xfc6 net/http.(*chunkWriter).Write(0xc00010c118, 0xc000176000, 0xd, 0x800, 0x1e51008, 0xc00010c0e0, 0xd) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/server.go:369 +0x2ba bufio.(*Writer).Flush(0xc00002e1c0, 0x0, 0x1e51008) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/bufio/bufio.go:593 +0x75 net/http.(*response).Flush(0xc00010c0e0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/server.go:1638 +0x35 main.main.func1(0x14241e0, 0xc00010c0e0, 0xc000164000) /Users/emmanuelodeke/Desktop/openSrc/bugs/golang/29246/main.go:22 +0x1c0 net/http.HandlerFunc.ServeHTTP(0x13c0138, 0x14241e0, 0xc00010c0e0, 0xc000164000) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/server.go:2007 +0x44 net/http.serverHandler.ServeHTTP(0xc00010c000, 0x14241e0, 0xc00010c0e0, 0xc000164000) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/server.go:2802 +0xa4 net/http.(*conn).serve(0xc0000caaa0, 0x1424b20, 0xc00015c000) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/server.go:1890 +0x875 created by net/http.(*Server).Serve /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/server.go:2927 +0x38e goroutine 9 [select]: io.(*pipe).Read(0xc0001060a0, 0xc00016a000, 0x8000, 0x8000, 0x11, 0x13247e0, 0xc000184010) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/io/pipe.go:50 +0xe7 io.(*PipeReader).Read(0xc000010020, 0xc00016a000, 0x8000, 0x8000, 0x11, 0x0, 0x0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/io/pipe.go:127 +0x4c io.copyBuffer(0x1e11fa0, 0xc000012110, 0x141e340, 0xc000010020, 0xc00016a000, 0x8000, 0x8000, 0xc000113c38, 0x100ba35, 0x134b520) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/io/io.go:402 +0x122 io.Copy(...) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/io/io.go:364 net/http.(*transferWriter).doBodyCopy(0xc0000cab40, 0x1e11fa0, 0xc000012110, 0x141e340, 0xc000010020, 0x8, 0xc000113cc0, 0x12ccd42) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/transfer.go:400 +0x6a net/http.(*transferWriter).writeBody(0xc0000cab40, 0x141e180, 0xc00002e0c0, 0x2, 0x2) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/transfer.go:348 +0x427 net/http.(*Request).write(0xc000110100, 0x141e180, 0xc00002e0c0, 0x0, 0xc00010a360, 0x0, 0x0, 0x0) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/request.go:682 +0x6d3 net/http.(*persistConn).writeLoop(0xc000124360) /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/transport.go:2171 +0x1c8 created by net/http.(*Transport).dialConn /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/net/http/transport.go:1539 +0xb23 rax 0x104 rbx 0x2 rcx 0x7ffeefbfd3e8 rdx 0xa00 rdi 0x1668328 rsi 0xa0100000b00 rbp 0x7ffeefbfd470 rsp 0x7ffeefbfd3e8 r8 0x0 r9 0xa0 r10 0x0 r11 0x202 r12 0x1668328 r13 0x16 r14 0xa0100000b00 r15 0xad945c0 rip 0x7fff61f6c86a rflags 0x203 cs 0x7 fs 0x0 gs 0x0 exit status 2 ```

which seems to me like in deed, on slow reads something funky is going on with our transport.

Separate examination

So in May 2019 (2 months ago), I encountered a similar report in https://github.com/googleapis/google-cloud-go/issues/1380 and from that other report, I came up with a hypothesis that perhaps the server is overloaded and doesn't send back any response for a long time and I made a repro to reproduce very similar stacktraces as per a repro https://gist.github.com/odeke-em/b62737b89b91e71ffbf0545581976cbc or https://play.golang.org/p/5b_d6RjikGN or inlined below

package main

import (
    "bytes"
    "context"
    "crypto/tls"
    "fmt"
    "io"
    "io/ioutil"
    "log"
    "mime/multipart"
    "net/http"
    "net/http/httptest"
    "net/textproto"
    "net/url"
    "os"
    "syscall"
    "time"

    "golang.org/x/net/http2"
)

func main() {
    cst := httptest.NewUnstartedServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        // Stalled server that takes forever to read or could be overloaded.
        // It already established a connection though.
        <-time.After(10 * time.Minute)

        slurp, _ := ioutil.ReadAll(r.Body)
        log.Printf("Request payload: %s\n", slurp)
        w.Write(bytes.Repeat([]byte("a"), 3000))
    }))
    if err := http2.ConfigureServer(cst.Config, nil); err != nil {
        log.Fatalf("http2.ConfigureServer: %v", err)
    }
    cst.StartTLS()
    defer cst.Close()

    tlsConfig := &tls.Config{InsecureSkipVerify: true}
    u, _ := url.Parse(cst.URL)
    tlsConn, err := tls.Dial("tcp", u.Host, tlsConfig)
    if err != nil {
        log.Fatalf("Failed to create a tls connection: %v", err)
    }

    prc, pwc := io.Pipe()

    go func() {
        h := make(textproto.MIMEHeader)
        mpw := multipart.NewWriter(pwc)
        w, err := mpw.CreatePart(h)
        if err != nil {
            mpw.Close()
            pwc.CloseWithError(fmt.Errorf("CreatePart failed: %v", err))
            return
        }

        n, _ := pwc.Write(bytes.Repeat([]byte("a"), 39<<20))
        println("read ", n)

        r := bytes.NewReader([]byte(`{"id": "1380", "type": "issue"}`))
        if _, err := io.Copy(w, r); err != nil {
            mpw.Close()
            pwc.CloseWithError(fmt.Errorf("Copy failed: %v", err))
            return
        }

        println("done read in goroutine")
        mpw.Close()
        pwc.Close()
    }()

    tr := &http2.Transport{TLSClientConfig: tlsConfig}
    cc, err := tr.NewClientConn(tlsConn)
    if err != nil {
        log.Fatalf("(*http2.Transport).NewClientConn: %v", err)
    }

    // Find our own process and in the background send ourselves SIGQUIT.
    selfProcess, err := os.FindProcess(os.Getpid())
    if err != nil {
        log.Fatalf("Failed to find own process: %v", err)
    }
    go func() {
        <-time.After(6 * time.Second)
        if err := selfProcess.Signal(syscall.SIGQUIT); err != nil {
            log.Fatalf("Failed to send self SIGQUIT: %v", err)
        }
    }()

    // Send that ping frame and ensure we have an established connection
    // and that the server is one stalled and body reads are stalled.
    if err := cc.Ping(context.Background()); err != nil {
        log.Fatalf("(*http2.ClientConn).Ping: %v", err)
    }

    req, _ := http.NewRequest("GET", cst.URL, prc)
    res, err := cc.RoundTrip(req)
    if err != nil {
        log.Fatalf("http.Transport.Roundtrip error: %v", err)
    }
    defer res.Body.Close()
    blob, _ := ioutil.ReadAll(res.Body)
    log.Printf("%s\n", blob)
}

which produced the curious stacktrace

goroutine 35 [select]:
io.(*pipe).Write(0xc00015cd20, 0xc0001d0090, 0x42, 0x82, 0x0, 0x0, 0x0)
    /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/io/pipe.go:87 +0x1dc
io.(*PipeWriter).Write(0xc00013a028, 0xc0001d0090, 0x42, 0x82, 0x12f8720, 0x13465a0, 0x10cf201)
    /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/io/pipe.go:153 +0x4c
bytes.(*Buffer).WriteTo(0xc0000ba240, 0x13bd460, 0xc00013a028, 0x26000d0, 0xc0000ba240, 0x1)
    /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/bytes/buffer.go:242 +0xb5
io.copyBuffer(0x13bd460, 0xc00013a028, 0x13bd220, 0xc0000ba240, 0x0, 0x0, 0x0, 0x2, 0x0, 0x0)
    /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/io/io.go:384 +0x33f
io.Copy(...)
    /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/io/io.go:364
mime/multipart.(*Writer).CreatePart(0xc0000ba210, 0xc00005fe48, 0x0, 0x0, 0x0, 0x0)
    /Users/emmanuelodeke/go/src/go.googlesource.com/go/src/mime/multipart/writer.go:121 +0x3fa
main.main.func2(0xc00013a028)
    /Users/emmanuelodeke/Desktop/openSrc/bugs/google-cloud-go/1380/main.go:48 +0x11f
created by main.main
    /Users/emmanuelodeke/Desktop/openSrc/bugs/google-cloud-go/1380/main.go:45 +0x3cd

I am posting this here to absolve the google-cloud-go package issue that I encountered and to continue the investigation here.

tehmoon commented 5 years ago

Thank you for following up @odeke-em ! This looks interesting, because it looks like you have the same logic. I've tried to add some kind of synchronization just right before the first write, but what I thought was working was not.

Edit

Actually it worked, but not sure it is bug free:

I don't think it is bug free because the waitgroup is done before the write can happen (otherwise write becomes blocking until the body is read by request).

But somehow it works well enough 🤷‍♂ . I tried to maximized the race condition chances by doing go wg.Done() then the write. Hopeful it will be blocking on writing before the go routine is running.

qingyunha commented 5 years ago
goroutine 38 [chan receive]:
net/http.finishAsyncByteRead.Read(0xc000162000, 0xc000170000, 0x8000, 0x8000, 0x0, 0x0, 0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/net/http/transfer.go:1030 +0x60
io.(*multiReader).Read(0xc000164020, 0xc000170000, 0x8000, 0x8000, 0x3, 0x0, 0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/io/multi.go:26 +0xaf
net/http.transferBodyReader.Read(0xc000162000, 0xc000170000, 0x8000, 0x8000, 0x1, 0x0, 0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/net/http/transfer.go:62 +0x56
io.copyBuffer(0x2444040, 0xc000166020, 0x12bf9e0, 0xc000162000, 0xc000170000, 0x8000, 0x8000, 0x123aae0, 0x1456a00, 0x2444040)
        /usr/local/Cellar/go/1.11.2/libexec/src/io/io.go:402 +0x125
io.Copy(0x2444040, 0xc000166020, 0x12bf9e0, 0xc000162000, 0xc000166020, 0x0, 0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/io/io.go:364 +0x5a
net/http.(*transferWriter).writeBody(0xc000162000, 0x12bed40, 0xc000134040, 0x2, 0x2)
        /usr/local/Cellar/go/1.11.2/libexec/src/net/http/transfer.go:355 +0x155
net/http.(*Request).write(0xc0000fe000, 0x12bed40, 0xc000134040, 0x0, 0xc000138030, 0x0, 0x0, 0x0)
        /usr/local/Cellar/go/1.11.2/libexec/src/net/http/request.go:645 +0x6e8
net/http.(*persistConn).writeLoop(0xc000110000)
        /usr/local/Cellar/go/1.11.2/libexec/src/net/http/transport.go:1888 +0x1b8
created by net/http.(*Transport).dialConn
        /usr/local/Cellar/go/1.11.2/libexec/src/net/http/transport.go:1339 +0x966

The blocking reason may be net/http.finishAsyncByteRead.Read never return EOF, and it's in a io.MultiReader.

t.Body = io.MultiReader(finishAsyncByteRead{t}, t.Body)

func (fr finishAsyncByteRead) Read(p []byte) (n int, err error) {
    if len(p) == 0 {
        return
    }    
    rres := <-fr.tw.ByteReadCh
    n, err = rres.n, rres.err
    if n == 1 {
        p[0] = rres.b
    }    
    return
}

And the following change should solve this problem.

diff --git a/src/net/http/transfer.go b/src/net/http/transfer.go
index 2e01a07..8a306ca 100644
--- a/src/net/http/transfer.go
+++ b/src/net/http/transfer.go
@@ -212,6 +212,7 @@ func (t *transferWriter) probeRequestBody() {
                        rres.b = buf[0]
                }
                t.ByteReadCh <- rres
+               close(t.ByteReadCh)
        }(t.Body)
        timer := time.NewTimer(200 * time.Millisecond)
        select {
@@ -1076,6 +1077,9 @@ func (fr finishAsyncByteRead) Read(p []byte) (n int, err error) {
        if n == 1 {
                p[0] = rres.b
        }
+       if n == 0 {
+               err = io.EOF
+       }
        return
 }
kevinconaway commented 5 years ago

@odeke-em Are you working on this particular issue?

I posted a way to reproduce this on a separate issue but the tldr that I can see is that it occurs when GetBody is called on the result returned by MediaInfo.UploadRequest. In my testing, this most often occurs when the initial POST request fails due to an error (like http2ErrNoCachedConn)

Based on some instrumentation I had added during testing, the body returned by GetBody isn't always closed which is what leaves the request hanging.

changkun commented 1 year ago

I think this is fixed via https://go.dev/cl/340256