golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.22k stars 17.7k forks source link

net/http: http2 requests blocked after timeout #57478

Open stampy88 opened 1 year ago

stampy88 commented 1 year ago

What version of Go are you using (go version)?

$ go version
go version go1.18.6 linux/amd64

Does this issue reproduce with the latest release?

Have not attempted yet as it is hard to reproduce. I am trying to attempt to write a reproducer.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/XXXX/.cache/go-build"
GOENV="/home/XXXX/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/XXXX/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/XXXX/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.18.6"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3899516689=/tmp/go-build -gno-record-gcc-switches"

What did you do?

The application does HTTP POSTs to configured clients endpoints whenever an event is received. The HTTP client is configured with a 5 second timeout so that long running requests don't block an event consumer too long.

What did you expect to see?

When a timeout occurs, I expected all go routines that the std library creates to handle the HTTP request to be cleaned up.

What did you see instead?

The server the app is connecting to supports HTTP 2. During periods where timeouts are occurring we can see the number of go routines steadily increase until the server it is trying to communicate to starts responding again. See stack traces below:


goroutine profile: total 1451
1258 @ 0x438936 0x449473 0x44944d 0x464865 0x4726e5 0x81d5af 0x81d59a 0x81bfee 0x468741
#   0x464864    sync.runtime_SemacquireMutex+0x24           /usr/local/go/src/runtime/sema.go:71
#   0x4726e4    sync.(*Mutex).lockSlow+0x164                /usr/local/go/src/sync/mutex.go:162
#   0x81d5ae    sync.(*Mutex).Lock+0x30e                /usr/local/go/src/sync/mutex.go:81
#   0x81d599    net/http.(*http2clientStream).cleanupWriteRequest+0x2f9 /usr/local/go/src/net/http/h2_bundle.go:8159
#   0x81bfed    net/http.(*http2clientStream).doRequest+0x2d        /usr/local/go/src/net/http/h2_bundle.go:7905

36 @ 0x438936 0x406a0c 0x406478 0x1005f13 0x468741
#   0x1005f12   main.main.func3.1+0xf2  /XXXXXX/main.go:383

16 @ 0x438936 0x4315b7 0x4630e9 0x4cb7f2 0x4ccb5a 0x4ccb48 0x6939a9 0x6a5345 0x787c1d 0x4f8b78 0x787e05 0x785416 0x78ae6f 0x78ae70 0x51af74 0x4c5f7a 0x7fe96e 0x7fe928 0x7ff1d5 0x8214f0 0x82094f 0x468741
#   0x4630e8    internal/poll.runtime_pollWait+0x88     /usr/local/go/src/runtime/netpoll.go:302
#   0x4cb7f1    internal/poll.(*pollDesc).wait+0x31     /usr/local/go/src/internal/poll/fd_poll_runtime.go:83
#   0x4ccb59    internal/poll.(*pollDesc).waitRead+0x259    /usr/local/go/src/internal/poll/fd_poll_runtime.go:88
#   0x4ccb47    internal/poll.(*FD).Read+0x247          /usr/local/go/src/internal/poll/fd_unix.go:167
#   0x6939a8    net.(*netFD).Read+0x28              /usr/local/go/src/net/fd_posix.go:55
#   0x6a5344    net.(*conn).Read+0x44               /usr/local/go/src/net/net.go:183
#   0x787c1c    crypto/tls.(*atLeastReader).Read+0x3c       /usr/local/go/src/crypto/tls/conn.go:785
#   0x4f8b77    bytes.(*Buffer).ReadFrom+0x97           /usr/local/go/src/bytes/buffer.go:204
#   0x787e04    crypto/tls.(*Conn).readFromUntil+0xe4       /usr/local/go/src/crypto/tls/conn.go:807
#   0x785415    crypto/tls.(*Conn).readRecordOrCCS+0x115    /usr/local/go/src/crypto/tls/conn.go:614
#   0x78ae6e    crypto/tls.(*Conn).readRecord+0x16e     /usr/local/go/src/crypto/tls/conn.go:582
#   0x78ae6f    crypto/tls.(*Conn).Read+0x16f           /usr/local/go/src/crypto/tls/conn.go:1285
#   0x51af73    bufio.(*Reader).Read+0x1b3          /usr/local/go/src/bufio/bufio.go:236
#   0x4c5f79    io.ReadAtLeast+0x99             /usr/local/go/src/io/io.go:331
#   0x7fe96d    io.ReadFull+0x6d                /usr/local/go/src/io/io.go:350
#   0x7fe927    net/http.http2readFrameHeader+0x27      /usr/local/go/src/net/http/h2_bundle.go:1566
#   0x7ff1d4    net/http.(*http2Framer).ReadFrame+0x94      /usr/local/go/src/net/http/h2_bundle.go:1830
#   0x8214ef    net/http.(*http2clientConnReadLoop).run+0x12f   /usr/local/go/src/net/http/h2_bundle.go:8820
#   0x82094e    net/http.(*http2ClientConn).readLoop+0x6e   /usr/local/go/src/net/http/h2_bundle.go:8716

13 @ 0x438936 0x449473 0x44944d 0x464745 0x474172 0x100ab15 0x10099af 0x1006073 0x468741
#   0x464744    sync.runtime_Semacquire+0x24        /usr/local/go/src/runtime/sema.go:56
#   0x474171    sync.(*WaitGroup).Wait+0x51     /usr/local/go/src/sync/waitgroup.go:136
#   0x100ab14   main.(*server).doProcessPacket+0x1034   XXXXXX/processpacket.go:168
#   0x10099ae   main.(*server).processPacket+0x18e  XXXXXX/processpacket.go:45
#   0x1006072   main.main.func3.1+0x252         XXXXXX/main.go:390

12 @ 0x438936 0x448392 0x857275 0x468741
#   0x857274    net/http.(*persistConn).writeLoop+0xf4  /usr/local/go/src/net/http/transport.go:2392

12 @ 0x438936 0x449473 0x44944d 0x464865 0x4726e5 0x825fb2 0x825f7e 0x81d545 0x81bfee 0x468741
#   0x464864    sync.runtime_SemacquireMutex+0x24           /usr/local/go/src/runtime/sema.go:71
#   0x4726e4    sync.(*Mutex).lockSlow+0x164                /usr/local/go/src/sync/mutex.go:162
#   0x825fb1    sync.(*Mutex).Lock+0x51                 /usr/local/go/src/sync/mutex.go:81
#   0x825f7d    net/http.(*http2ClientConn).writeStreamReset+0x1d   /usr/local/go/src/net/http/h2_bundle.go:9574
#   0x81d544    net/http.(*http2clientStream).cleanupWriteRequest+0x2a4 /usr/local/go/src/net/http/h2_bundle.go:8145
#   0x81bfed    net/http.(*http2clientStream).doRequest+0x2d        /usr/local/go/src/net/http/h2_bundle.go:7905

11 @ 0x438936 0x448392 0x81bb71 0x817cb7 0x8264db 0x8264d5 0x84b790 0x832ab9 0x7f43f8 0x7f3c9b 0x7f60d5 0x100e0e6 0x100e0d9 0x468741
#   0x81bb70    net/http.(*http2ClientConn).RoundTrip+0x450     /usr/local/go/src/net/http/h2_bundle.go:7874
#   0x817cb6    net/http.(*http2Transport).RoundTripOpt+0x1b6       /usr/local/go/src/net/http/h2_bundle.go:7218
#   0x8264da    net/http.(*http2Transport).RoundTrip+0x1a       /usr/local/go/src/net/http/h2_bundle.go:7179
#   0x8264d4    net/http.http2noDialH2RoundTripper.RoundTrip+0x14   /usr/local/go/src/net/http/h2_bundle.go:9682
#   0x84b78f    net/http.(*Transport).roundTrip+0x38f           /usr/local/go/src/net/http/transport.go:539
#   0x832ab8    net/http.(*Transport).RoundTrip+0x18            /usr/local/go/src/net/http/roundtrip.go:17
#   0x7f43f7    net/http.send+0x5d7                 /usr/local/go/src/net/http/client.go:252
#   0x7f3c9a    net/http.(*Client).send+0x9a                /usr/local/go/src/net/http/client.go:176
#   0x7f60d4    net/http.(*Client).do+0x8f4             /usr/local/go/src/net/http/client.go:725
#   0x100e0e5   net/http.(*Client).Do+0x985             /usr/local/go/src/net/http/client.go:593
#   0x100e0d8   main.(*server).sendRestMessage+0x978            /XXXXXX/rest.go:89

10 @ 0x438936 0x406a0c 0x406438 0x8f57d8 0x468741
#   0x8f57d7    google.golang.org/grpc.(*addrConn).resetTransport+0x477 /var/go/go/pkg/mod/google.golang.org/grpc@v1.28.1/clientconn.go:1155

10 @ 0x438936 0x4315b7 0x4630e9 0x4cb7f2 0x4ccb5a 0x4ccb48 0x6939a9 0x6a5345 0x787c1d 0x4f8b78 0x787e05 0x785416 0x78ae6f 0x78ae70 0x51af74 0x4c5f7a 0x86c2ce 0x86c288 0x86cb35 0x889c73 0x468741
#   0x4630e8    internal/poll.runtime_pollWait+0x88                 /usr/local/go/src/runtime/netpoll.go:302
#   0x4cb7f1    internal/poll.(*pollDesc).wait+0x31                 /usr/local/go/src/internal/poll/fd_poll_runtime.go:83
#   0x4ccb59    internal/poll.(*pollDesc).waitRead+0x259                /usr/local/go/src/internal/poll/fd_poll_runtime.go:88
#   0x4ccb47    internal/poll.(*FD).Read+0x247                      /usr/local/go/src/internal/poll/fd_unix.go:167
#   0x6939a8    net.(*netFD).Read+0x28                          /usr/local/go/src/net/fd_posix.go:55
#   0x6a5344    net.(*conn).Read+0x44                           /usr/local/go/src/net/net.go:183
#   0x787c1c    crypto/tls.(*atLeastReader).Read+0x3c                   /usr/local/go/src/crypto/tls/conn.go:785
#   0x4f8b77    bytes.(*Buffer).ReadFrom+0x97                       /usr/local/go/src/bytes/buffer.go:204
#   0x787e04    crypto/tls.(*Conn).readFromUntil+0xe4                   /usr/local/go/src/crypto/tls/conn.go:807
#   0x785415    crypto/tls.(*Conn).readRecordOrCCS+0x115                /usr/local/go/src/crypto/tls/conn.go:614
#   0x78ae6e    crypto/tls.(*Conn).readRecord+0x16e                 /usr/local/go/src/crypto/tls/conn.go:582
#   0x78ae6f    crypto/tls.(*Conn).Read+0x16f                       /usr/local/go/src/crypto/tls/conn.go:1285
#   0x51af73    bufio.(*Reader).Read+0x1b3                      /usr/local/go/src/bufio/bufio.go:236
#   0x4c5f79    io.ReadAtLeast+0x99                         /usr/local/go/src/io/io.go:331
#   0x86c2cd    io.ReadFull+0x6d                            /usr/local/go/src/io/io.go:350
#   0x86c287    golang.org/x/net/http2.readFrameHeader+0x27             /var/go/go/pkg/mod/golang.org/x/net@v0.0.0-20220425223048-2871e0cb64e4/http2/frame.go:237
#   0x86cb34    golang.org/x/net/http2.(*Framer).ReadFrame+0x94             /var/go/go/pkg/mod/golang.org/x/net@v0.0.0-20220425223048-2871e0cb64e4/http2/frame.go:498
#   0x889c72    google.golang.org/grpc/internal/transport.(*http2Client).reader+0x232   /var/go/go/pkg/mod/google.golang.org/grpc@v1.28.1/internal/transport/http2_client.go:1270

10 @ 0x438936 0x448392 0x8790b5 0x879745 0x882045 0x468741
#   0x8790b4    google.golang.org/grpc/internal/transport.(*controlBuffer).get+0x114    /var/go/go/pkg/mod/google.golang.org/grpc@v1.28.1/internal/transport/controlbuf.go:395
#   0x879744    google.golang.org/grpc/internal/transport.(*loopyWriter).run+0x84   /var/go/go/pkg/mod/google.golang.org/grpc@v1.28.1/internal/transport/controlbuf.go:513
#   0x882044    google.golang.org/grpc/internal/transport.newHTTP2Client.func3+0x64 /var/go/go/pkg/mod/google.golang.org/grpc@v1.28.1/internal/transport/http2_client.go:346

9 @ 0x438936 0x448392 0x81c16c 0x81bfde 0x468741
#   0x81c16b    net/http.(*http2clientStream).writeRequest+0x14b    /usr/local/go/src/net/http/h2_bundle.go:7929
#   0x81bfdd    net/http.(*http2clientStream).doRequest+0x1d        /usr/local/go/src/net/http/h2_bundle.go:7904

6 @ 0x438936 0x4315b7 0x4630e9 0x4cb7f2 0x4ccb5a 0x4ccb48 0x6939a9 0x6a5345 0x787c1d 0x4f8b78 0x787e05 0x785416 0x78ae6f 0x78ae70 0x85476e 0x51a9a3 0x51aafd 0x85558c 0x468741
#   0x4630e8    internal/poll.runtime_pollWait+0x88     /usr/local/go/src/runtime/netpoll.go:302
#   0x4cb7f1    internal/poll.(*pollDesc).wait+0x31     /usr/local/go/src/internal/poll/fd_poll_runtime.go:83
#   0x4ccb59    internal/poll.(*pollDesc).waitRead+0x259    /usr/local/go/src/internal/poll/fd_poll_runtime.go:88
#   0x4ccb47    internal/poll.(*FD).Read+0x247          /usr/local/go/src/internal/poll/fd_unix.go:167
#   0x6939a8    net.(*netFD).Read+0x28              /usr/local/go/src/net/fd_posix.go:55
#   0x6a5344    net.(*conn).Read+0x44               /usr/local/go/src/net/net.go:183
#   0x787c1c    crypto/tls.(*atLeastReader).Read+0x3c       /usr/local/go/src/crypto/tls/conn.go:785
#   0x4f8b77    bytes.(*Buffer).ReadFrom+0x97           /usr/local/go/src/bytes/buffer.go:204
#   0x787e04    crypto/tls.(*Conn).readFromUntil+0xe4       /usr/local/go/src/crypto/tls/conn.go:807
#   0x785415    crypto/tls.(*Conn).readRecordOrCCS+0x115    /usr/local/go/src/crypto/tls/conn.go:614
#   0x78ae6e    crypto/tls.(*Conn).readRecord+0x16e     /usr/local/go/src/crypto/tls/conn.go:582
#   0x78ae6f    crypto/tls.(*Conn).Read+0x16f           /usr/local/go/src/crypto/tls/conn.go:1285
#   0x85476d    net/http.(*persistConn).Read+0x4d       /usr/local/go/src/net/http/transport.go:1929
#   0x51a9a2    bufio.(*Reader).fill+0x102          /usr/local/go/src/bufio/bufio.go:106
#   0x51aafc    bufio.(*Reader).Peek+0x5c           /usr/local/go/src/bufio/bufio.go:144
#   0x85558b    net/http.(*persistConn).readLoop+0x1ab      /usr/local/go/src/net/http/transport.go:2093

3 @ 0x438936 0x4315b7 0x4630e9 0x4cb7f2 0x4ccb5a 0x4ccb48 0x6939a9 0x6a5345 0x787c1d 0x4f8b78 0x787e05 0x785416 0x78ae6f 0x78ae70 0x85476e 0x51a9a3 0x51b58f 0x7f12a5 0x7f0d28 0x7f0fee 0x8492dc 0x8491a5 0x858ea2 0x517fbf 0x517bbb 0x517818 0xf2b36b 0xf2afba 0x468741
#   0x4630e8    internal/poll.runtime_pollWait+0x88             /usr/local/go/src/runtime/netpoll.go:302
#   0x4cb7f1    internal/poll.(*pollDesc).wait+0x31             /usr/local/go/src/internal/poll/fd_poll_runtime.go:83
#   0x4ccb59    internal/poll.(*pollDesc).waitRead+0x259            /usr/local/go/src/internal/poll/fd_poll_runtime.go:88
#   0x4ccb47    internal/poll.(*FD).Read+0x247                  /usr/local/go/src/internal/poll/fd_unix.go:167
#   0x6939a8    net.(*netFD).Read+0x28                      /usr/local/go/src/net/fd_posix.go:55
#   0x6a5344    net.(*conn).Read+0x44                       /usr/local/go/src/net/net.go:183
#   0x787c1c    crypto/tls.(*atLeastReader).Read+0x3c               /usr/local/go/src/crypto/tls/conn.go:785
#   0x4f8b77    bytes.(*Buffer).ReadFrom+0x97                   /usr/local/go/src/bytes/buffer.go:204
#   0x787e04    crypto/tls.(*Conn).readFromUntil+0xe4               /usr/local/go/src/crypto/tls/conn.go:807
#   0x785415    crypto/tls.(*Conn).readRecordOrCCS+0x115            /usr/local/go/src/crypto/tls/conn.go:614
#   0x78ae6e    crypto/tls.(*Conn).readRecord+0x16e             /usr/local/go/src/crypto/tls/conn.go:582
#   0x78ae6f    crypto/tls.(*Conn).Read+0x16f                   /usr/local/go/src/crypto/tls/conn.go:1285
#   0x85476d    net/http.(*persistConn).Read+0x4d               /usr/local/go/src/net/http/transport.go:1929
#   0x51a9a2    bufio.(*Reader).fill+0x102                  /usr/local/go/src/bufio/bufio.go:106
#   0x51b58e    bufio.(*Reader).ReadSlice+0x2e                  /usr/local/go/src/bufio/bufio.go:371
#   0x7f12a4    net/http/internal.readChunkLine+0x24                /usr/local/go/src/net/http/internal/chunked.go:129
#   0x7f0d27    net/http/internal.(*chunkedReader).beginChunk+0x27      /usr/local/go/src/net/http/internal/chunked.go:48
#   0x7f0fed    net/http/internal.(*chunkedReader).Read+0x14d           /usr/local/go/src/net/http/internal/chunked.go:98
#   0x8492db    net/http.(*body).readLocked+0x3b                /usr/local/go/src/net/http/transfer.go:844
#   0x8491a4    net/http.(*body).Read+0x124                 /usr/local/go/src/net/http/transfer.go:836
#   0x858ea1    net/http.(*bodyEOFSignal).Read+0x141                /usr/local/go/src/net/http/transport.go:2774
#   0x517fbe    encoding/json.(*Decoder).refill+0x17e               /usr/local/go/src/encoding/json/stream.go:165
#   0x517bba    encoding/json.(*Decoder).readValue+0xba             /usr/local/go/src/encoding/json/stream.go:140
#   0x517817    encoding/json.(*Decoder).Decode+0x77                /usr/local/go/src/encoding/json/stream.go:63
#   0xf2b36a    XXXXXX/kuberesolver.(*streamWatcher).Decode+0x6a    XXXXXX/kuberesolver@v2.1.7+incompatible/stream.go:96
#   0xf2afb9    XXXXXX/kuberesolver.(*streamWatcher).receive+0x119  XXXXXX/kuberesolver@v2.1.7+incompatible/stream.go:72

3 @ 0x438936 0x4315b7 0x4630e9 0x4cb7f2 0x4ccb5a 0x4ccb48 0x6939a9 0x6a5345 0x8345ed 0x51a9a3 0x51b58f 0x51b7e7 0x7d4ff9 0x82f9b9 0x82f9ba 0x835b8a 0x839f8b 0x468741
#   0x4630e8    internal/poll.runtime_pollWait+0x88     /usr/local/go/src/runtime/netpoll.go:302
#   0x4cb7f1    internal/poll.(*pollDesc).wait+0x31     /usr/local/go/src/internal/poll/fd_poll_runtime.go:83
#   0x4ccb59    internal/poll.(*pollDesc).waitRead+0x259    /usr/local/go/src/internal/poll/fd_poll_runtime.go:88
#   0x4ccb47    internal/poll.(*FD).Read+0x247          /usr/local/go/src/internal/poll/fd_unix.go:167
#   0x6939a8    net.(*netFD).Read+0x28              /usr/local/go/src/net/fd_posix.go:55
#   0x6a5344    net.(*conn).Read+0x44               /usr/local/go/src/net/net.go:183
#   0x8345ec    net/http.(*connReader).Read+0x16c       /usr/local/go/src/net/http/server.go:780
#   0x51a9a2    bufio.(*Reader).fill+0x102          /usr/local/go/src/bufio/bufio.go:106
#   0x51b58e    bufio.(*Reader).ReadSlice+0x2e          /usr/local/go/src/bufio/bufio.go:371
#   0x51b7e6    bufio.(*Reader).ReadLine+0x26           /usr/local/go/src/bufio/bufio.go:400
#   0x7d4ff8    net/textproto.(*Reader).readLineSlice+0x98  /usr/local/go/src/net/textproto/reader.go:57
#   0x82f9b8    net/textproto.(*Reader).ReadLine+0x78       /usr/local/go/src/net/textproto/reader.go:38
#   0x82f9b9    net/http.readRequest+0x79           /usr/local/go/src/net/http/request.go:1029
#   0x835b89    net/http.(*conn).readRequest+0x249      /usr/local/go/src/net/http/server.go:988
#   0x839f8a    net/http.(*conn).serve+0x32a            /usr/local/go/src/net/http/server.go:1891

3 @ 0x438936 0x4315b7 0x4630e9 0x4cb7f2 0x4ccb5a 0x4ccb48 0x6939a9 0x6a5345 0x85476e 0x51a9a3 0x51aafd 0x85558c 0x468741
#   0x4630e8    internal/poll.runtime_pollWait+0x88     /usr/local/go/src/runtime/netpoll.go:302
#   0x4cb7f1    internal/poll.(*pollDesc).wait+0x31     /usr/local/go/src/internal/poll/fd_poll_runtime.go:83
#   0x4ccb59    internal/poll.(*pollDesc).waitRead+0x259    /usr/local/go/src/internal/poll/fd_poll_runtime.go:88
#   0x4ccb47    internal/poll.(*FD).Read+0x247          /usr/local/go/src/internal/poll/fd_unix.go:167
#   0x6939a8    net.(*netFD).Read+0x28              /usr/local/go/src/net/fd_posix.go:55
#   0x6a5344    net.(*conn).Read+0x44               /usr/local/go/src/net/net.go:183
#   0x85476d    net/http.(*persistConn).Read+0x4d       /usr/local/go/src/net/http/transport.go:1929
#   0x51a9a2    bufio.(*Reader).fill+0x102          /usr/local/go/src/bufio/bufio.go:106
#   0x51aafc    bufio.(*Reader).Peek+0x5c           /usr/local/go/src/bufio/bufio.go:144
#   0x85558b    net/http.(*persistConn).readLoop+0x1ab      /usr/local/go/src/net/http/transport.go:2093

3 @ 0x438936 0x448392 0x856179 0x468741
#   0x856178    net/http.(*persistConn).readLoop+0xd98  /usr/local/go/src/net/http/transport.go:2213

3 @ 0x438936 0x448392 0x8ed3cf 0x468741
#   0x8ed3ce    google.golang.org/grpc.(*ccBalancerWrapper).watcher+0x8e    /var/go/go/pkg/mod/google.golang.org/grpc@v1.28.1/balancer_conn_wrappers.go:69

3 @ 0x438936 0x448392 0xf29867 0xf289db 0xf2b5de 0xf2b50e 0x468741
#   0xf29866    XXXXXX/kuberesolver.(*kResolver).watch+0x206        XXXXXX/kuberesolver@v2.1.7+incompatible/builder.go:276
#   0xf289da    XXXXXX/kuberesolver.(*kubeBuilder).Build.func1+0x3a XXXXXX/kuberesolver@v2.1.7+incompatible/builder.go:161
#   0xf2b5dd    XXXXXX/kuberesolver.until.func1+0x3d            XXXXXX/kuberesolver@v2.1.7+incompatible/util.go:19
#   0xf2b50d    XXXXXX/kuberesolver.until+0x4d              XXXXXX/kuberesolver@v2.1.7+incompatible/util.go:20

2 @ 0x438936 0x448392 0x81c9e9 0x81bfde 0x468741
#   0x81c9e8    net/http.(*http2clientStream).writeRequest+0x9c8    /usr/local/go/src/net/http/h2_bundle.go:8042
#   0x81bfdd    net/http.(*http2clientStream).doRequest+0x1d        /usr/local/go/src/net/http/h2_bundle.go:7904

2 @ 0x438936 0x448392 0x858134 0x84bbc9 0x832ab9 0x7f43f8 0x7f3c9b 0x7f60d5 0x100e0e6 0x100e0d9 0x468741
#   0x858133    net/http.(*persistConn).roundTrip+0x973 /usr/local/go/src/net/http/transport.go:2620
#   0x84bbc8    net/http.(*Transport).roundTrip+0x7c8   /usr/local/go/src/net/http/transport.go:594
#   0x832ab8    net/http.(*Transport).RoundTrip+0x18    /usr/local/go/src/net/http/roundtrip.go:17
#   0x7f43f7    net/http.send+0x5d7         /usr/local/go/src/net/http/client.go:252
#   0x7f3c9a    net/http.(*Client).send+0x9a        /usr/local/go/src/net/http/client.go:176
#   0x7f60d4    net/http.(*Client).do+0x8f4     /usr/local/go/src/net/http/client.go:725
#   0x100e0e5   net/http.(*Client).Do+0x985     /usr/local/go/src/net/http/client.go:593
#   0x100e0d8   main.(*server).sendRestMessage+0x978    XXXXXXs/rest.go:89

1 @ 0x40bf14 0x464eaf 0xf2c6d9 0x468741
#   0x464eae    os/signal.signal_recv+0x2e  /usr/local/go/src/runtime/sigqueue.go:151
#   0xf2c6d8    os/signal.loop+0x18     /usr/local/go/src/os/signal/signal_unix.go:23

1 @ 0x438936 0x406a0c 0x406438 0x10055e9 0x438572 0x468741
#   0x10055e8   main.main+0x1028    XXXXXX/main.go:425
#   0x438571    runtime.main+0x211  /usr/local/go/src/runtime/proc.go:250

1 @ 0x438936 0x406a0c 0x406438 0xf38312 0x468741
#   0xf38311    XXXXXX/app.(*Server).handleSignals+0x151    XXXXXX/app/server.go:194

1 @ 0x438936 0x406a0c 0x406478 0xea7614 0x468741
#   0xea7613    github.com/eclipse/paho%2emqtt%2egolang.startComms.func2+0x73   /var/go/go/pkg/mod/github.com/eclipse/paho.mqtt.golang@v1.4.2/net.go:430

1 @ 0x438936 0x406a0c 0x406478 0xea76b1 0x468741
#   0xea76b0    github.com/eclipse/paho%2emqtt%2egolang.startComms.func1+0x70   /var/go/go/pkg/mod/github.com/eclipse/paho.mqtt.golang@v1.4.2/net.go:407

1 @ 0x438936 0x406a0c 0x406478 0xeaab1c 0x468741
#   0xeaab1b    github.com/eclipse/paho%2emqtt%2egolang.(*router).matchAndDispatch.func2+0xbb   /var/go/go/pkg/mod/github.com/eclipse/paho.mqtt.golang@v1.4.2/router.go:174

1 @ 0x438936 0x406a0c 0x406478 0xf62628 0x468741
#   0xf62627    github.com/streadway/amqp.(*consumers).buffer+0x107 /var/go/go/pkg/mod/github.com/streadway/amqp@v0.0.0-20190404075320-75d898a42a94/consumers.go:61

1 @ 0x438936 0x4315b7 0x4630e9 0x4cb7f2 0x4ccb5a 0x4ccb48 0x6939a9 0x6a5345 0x51af74 0x4c5f7a 0xf63993 0xf6396f 0xf6048a 0x468741
#   0x4630e8    internal/poll.runtime_pollWait+0x88         /usr/local/go/src/runtime/netpoll.go:302
#   0x4cb7f1    internal/poll.(*pollDesc).wait+0x31         /usr/local/go/src/internal/poll/fd_poll_runtime.go:83
#   0x4ccb59    internal/poll.(*pollDesc).waitRead+0x259        /usr/local/go/src/internal/poll/fd_poll_runtime.go:88
#   0x4ccb47    internal/poll.(*FD).Read+0x247              /usr/local/go/src/internal/poll/fd_unix.go:167
#   0x6939a8    net.(*netFD).Read+0x28                  /usr/local/go/src/net/fd_posix.go:55
#   0x6a5344    net.(*conn).Read+0x44                   /usr/local/go/src/net/net.go:183
#   0x51af73    bufio.(*Reader).Read+0x1b3              /usr/local/go/src/bufio/bufio.go:236
#   0x4c5f79    io.ReadAtLeast+0x99                 /usr/local/go/src/io/io.go:331
#   0xf63992    io.ReadFull+0x52                    /usr/local/go/src/io/io.go:350
#   0xf6396e    github.com/streadway/amqp.(*reader).ReadFrame+0x2e  /var/go/go/pkg/mod/github.com/streadway/amqp@v0.0.0-20190404075320-75d898a42a94/read.go:49
#   0xf60489    github.com/streadway/amqp.(*Connection).reader+0x1c9    /var/go/go/pkg/mod/github.com/streadway/amqp@v0.0.0-20190404075320-75d898a42a94/connection.go:521

1 @ 0x438936 0x4315b7 0x4630e9 0x4cb7f2 0x4ccb5a 0x4ccb48 0x6939a9 0x6a5345 0x787c1d 0x4f8b78 0x787e05 0x785416 0x78ae6f 0x78ae70 0x4c5f7a 0xe8111f 0xe81102 0xea4ce5 0x468741
#   0x4630e8    internal/poll.runtime_pollWait+0x88                 /usr/local/go/src/runtime/netpoll.go:302
#   0x4cb7f1    internal/poll.(*pollDesc).wait+0x31                 /usr/local/go/src/internal/poll/fd_poll_runtime.go:83
#   0x4ccb59    internal/poll.(*pollDesc).waitRead+0x259                /usr/local/go/src/internal/poll/fd_poll_runtime.go:88
#   0x4ccb47    internal/poll.(*FD).Read+0x247                      /usr/local/go/src/internal/poll/fd_unix.go:167
#   0x6939a8    net.(*netFD).Read+0x28                          /usr/local/go/src/net/fd_posix.go:55
#   0x6a5344    net.(*conn).Read+0x44                           /usr/local/go/src/net/net.go:183
#   0x787c1c    crypto/tls.(*atLeastReader).Read+0x3c                   /usr/local/go/src/crypto/tls/conn.go:785
#   0x4f8b77    bytes.(*Buffer).ReadFrom+0x97                       /usr/local/go/src/bytes/buffer.go:204
#   0x787e04    crypto/tls.(*Conn).readFromUntil+0xe4                   /usr/local/go/src/crypto/tls/conn.go:807
#   0x785415    crypto/tls.(*Conn).readRecordOrCCS+0x115                /usr/local/go/src/crypto/tls/conn.go:614
#   0x78ae6e    crypto/tls.(*Conn).readRecord+0x16e                 /usr/local/go/src/crypto/tls/conn.go:582
#   0x78ae6f    crypto/tls.(*Conn).Read+0x16f                       /usr/local/go/src/crypto/tls/conn.go:1285
#   0x4c5f79    io.ReadAtLeast+0x99                         /usr/local/go/src/io/io.go:331
#   0xe8111e    io.ReadFull+0x5e                            /usr/local/go/src/io/io.go:350
#   0xe81101    github.com/eclipse/paho.mqtt.golang/packets.ReadPacket+0x41     /var/go/go/pkg/mod/github.com/eclipse/paho.mqtt.golang@v1.4.2/packets/packets.go:131
#   0xea4ce4    github.com/eclipse/paho%2emqtt%2egolang.startIncoming.func1+0xe4    /var/go/go/pkg/mod/github.com/eclipse/paho.mqtt.golang@v1.4.2/net.go:124

1 @ 0x438936 0x4315b7 0x4630e9 0x4cb7f2 0x4ccb5a 0x4ccb48 0x6939a9 0x6a5345 0x83411f 0x468741
#   0x4630e8    internal/poll.runtime_pollWait+0x88     /usr/local/go/src/runtime/netpoll.go:302
#   0x4cb7f1    internal/poll.(*pollDesc).wait+0x31     /usr/local/go/src/internal/poll/fd_poll_runtime.go:83
#   0x4ccb59    internal/poll.(*pollDesc).waitRead+0x259    /usr/local/go/src/internal/poll/fd_poll_runtime.go:88
#   0x4ccb47    internal/poll.(*FD).Read+0x247          /usr/local/go/src/internal/poll/fd_unix.go:167
#   0x6939a8    net.(*netFD).Read+0x28              /usr/local/go/src/net/fd_posix.go:55
#   0x6a5344    net.(*conn).Read+0x44               /usr/local/go/src/net/net.go:183
#   0x83411e    net/http.(*connReader).backgroundRead+0x3e  /usr/local/go/src/net/http/server.go:672

1 @ 0x438936 0x4315b7 0x4630e9 0x4cb7f2 0x4ceb05 0x4ceaf2 0x694229 0x6a5525 0x788528 0x7889f1 0x789cf4 0x8176e9 0x51c5a2 0x825fdc 0x81d545 0x81bfee 0x468741
#   0x4630e8    internal/poll.runtime_pollWait+0x88         /usr/local/go/src/runtime/netpoll.go:302
#   0x4cb7f1    internal/poll.(*pollDesc).wait+0x31         /usr/local/go/src/internal/poll/fd_poll_runtime.go:83
#   0x4ceb04    internal/poll.(*pollDesc).waitWrite+0x2c4       /usr/local/go/src/internal/poll/fd_poll_runtime.go:92
#   0x4ceaf1    internal/poll.(*FD).Write+0x2b1             /usr/local/go/src/internal/poll/fd_unix.go:391
#   0x694228    net.(*netFD).Write+0x28                 /usr/local/go/src/net/fd_posix.go:96
#   0x6a5524    net.(*conn).Write+0x44                  /usr/local/go/src/net/net.go:195
#   0x788527    crypto/tls.(*Conn).write+0x107              /usr/local/go/src/crypto/tls/conn.go:920
#   0x7889f0    crypto/tls.(*Conn).writeRecordLocked+0x350      /usr/local/go/src/crypto/tls/conn.go:988
#   0x789cf3    crypto/tls.(*Conn).Write+0x3f3              /usr/local/go/src/crypto/tls/conn.go:1159
#   0x8176e8    net/http.http2stickyErrWriter.Write+0x148       /usr/local/go/src/net/http/h2_bundle.go:7133
#   0x51c5a1    bufio.(*Writer).Flush+0x61              /usr/local/go/src/bufio/bufio.go:628
#   0x825fdb    net/http.(*http2ClientConn).writeStreamReset+0x7b   /usr/local/go/src/net/http/h2_bundle.go:9576
#   0x81d544    net/http.(*http2clientStream).cleanupWriteRequest+0x2a4 /usr/local/go/src/net/http/h2_bundle.go:8145
#   0x81bfed    net/http.(*http2clientStream).doRequest+0x2d        /usr/local/go/src/net/http/h2_bundle.go:7905

1 @ 0x438936 0x4315b7 0x4630e9 0x4cb7f2 0x4d0a0c 0x4d09f9 0x695ab5 0x6ae528 0x6ad6fd 0x83f2e5 0x83ef1d 0x10069cf 0x100695b 0x468741
#   0x4630e8    internal/poll.runtime_pollWait+0x88     /usr/local/go/src/runtime/netpoll.go:302
#   0x4cb7f1    internal/poll.(*pollDesc).wait+0x31     /usr/local/go/src/internal/poll/fd_poll_runtime.go:83
#   0x4d0a0b    internal/poll.(*pollDesc).waitRead+0x22b    /usr/local/go/src/internal/poll/fd_poll_runtime.go:88
#   0x4d09f8    internal/poll.(*FD).Accept+0x218        /usr/local/go/src/internal/poll/fd_unix.go:614
#   0x695ab4    net.(*netFD).accept+0x34            /usr/local/go/src/net/fd_unix.go:172
#   0x6ae527    net.(*TCPListener).accept+0x27          /usr/local/go/src/net/tcpsock_posix.go:139
#   0x6ad6fc    net.(*TCPListener).Accept+0x3c          /usr/local/go/src/net/tcpsock.go:288
#   0x83f2e4    net/http.(*Server).Serve+0x384          /usr/local/go/src/net/http/server.go:3039
#   0x83ef1c    net/http.(*Server).ListenAndServe+0x7c      /usr/local/go/src/net/http/server.go:2968
#   0x10069ce   net/http.ListenAndServe+0xce            /usr/local/go/src/net/http/server.go:3222
#   0x100695a   main.main.func1+0x5a                XXXXXX/main.go:286

1 @ 0x438936 0x4315b7 0x4630e9 0x4cb7f2 0x4d0a0c 0x4d09f9 0x695ab5 0x6ae528 0x6ad6fd 0x83f2e5 0xf37dff 0x468741
#   0x4630e8    internal/poll.runtime_pollWait+0x88             /usr/local/go/src/runtime/netpoll.go:302
#   0x4cb7f1    internal/poll.(*pollDesc).wait+0x31             /usr/local/go/src/internal/poll/fd_poll_runtime.go:83
#   0x4d0a0b    internal/poll.(*pollDesc).waitRead+0x22b            /usr/local/go/src/internal/poll/fd_poll_runtime.go:88
#   0x4d09f8    internal/poll.(*FD).Accept+0x218                /usr/local/go/src/internal/poll/fd_unix.go:614
#   0x695ab4    net.(*netFD).accept+0x34                    /usr/local/go/src/net/fd_unix.go:172
#   0x6ae527    net.(*TCPListener).accept+0x27                  /usr/local/go/src/net/tcpsock_posix.go:139
#   0x6ad6fc    net.(*TCPListener).Accept+0x3c                  /usr/local/go/src/net/tcpsock.go:288
#   0x83f2e4    net/http.(*Server).Serve+0x384                  /usr/local/go/src/net/http/server.go:3039
#   0xf37dfe    XXXXXX/app.(*Server).Start.func1+0xbe   XXXXXX/app/server.go:146

1 @ 0x438936 0x448392 0x1006492 0x468741
#   0x1006491   main.main.func2+0x151   XXXXXX/main.go:356

1 @ 0x438936 0x448392 0xe9c8fb 0x468741
#   0xe9c8fa    github.com/eclipse/paho%2emqtt%2egolang.(*client).startCommsWorkers.func2+0xfa  /var/go/go/pkg/mod/github.com/eclipse/paho.mqtt.golang@v1.4.2/client.go:663

1 @ 0x438936 0x448392 0xe9ce5c 0x468741
#   0xe9ce5b    github.com/eclipse/paho%2emqtt%2egolang.(*client).startCommsWorkers.func1+0x17b /var/go/go/pkg/mod/github.com/eclipse/paho.mqtt.golang@v1.4.2/client.go:628

1 @ 0x438936 0x448392 0xea51cd 0x468741
#   0xea51cc    github.com/eclipse/paho%2emqtt%2egolang.startIncomingComms.func1+0x18c  /var/go/go/pkg/mod/github.com/eclipse/paho.mqtt.golang@v1.4.2/net.go:175

1 @ 0x438936 0x448392 0xea6565 0x468741
#   0xea6564    github.com/eclipse/paho%2emqtt%2egolang.startOutgoingComms.func1+0x1e4  /var/go/go/pkg/mod/github.com/eclipse/paho.mqtt.golang@v1.4.2/net.go:277

1 @ 0x438936 0x448392 0xea9569 0x468741
#   0xea9568    github.com/eclipse/paho%2emqtt%2egolang.keepalive+0x1e8 /var/go/go/pkg/mod/github.com/eclipse/paho.mqtt.golang@v1.4.2/ping.go:48

1 @ 0x438936 0x448392 0xead22a 0xf480d3 0xf47f0d 0xf48226 0x10017bf 0x100b2c5 0x10099af 0x1006073 0x468741
#   0xead229    github.com/eclipse/paho%2emqtt%2egolang.(*baseToken).WaitTimeout+0x69           /var/go/go/pkg/mod/github.com/eclipse/paho.mqtt.golang@v1.4.2/token.go:84
#   0xf480d2    XXXXXX/mqtt.(*MqttBuddy).SendMessage.func1+0x32 XXXXXX/mqtt/buddy.go:212
#   0xf47f0c    XXXXXX/mqtt.(*MqttBuddy).SendMessage+0x14c      XXXXXX/mqtt/buddy.go:222
#   0xf48225    XXXXXX/mqtt.BuddySendAndDone+0x105          XXXXXX/mqtt/buddy.go:244
#   0x10017be   main.(*server).sendAzureMessage+0x73e                           XXXXXX/azure.go:98
#   0x100b2c4   main.(*server).doProcessPacket+0x17e4                           XXXXXX/processpacket.go:157
#   0x10099ae   main.(*server).processPacket+0x18e                          XXXXXX/processpacket.go:45
#   0x1006072   main.main.func3.1+0x252                                 XXXXXX/main.go:390

1 @ 0x438936 0x448392 0xf60738 0x468741
#   0xf60737    github.com/streadway/amqp.(*Connection).heartbeater+0x1b7   /var/go/go/pkg/mod/github.com/streadway/amqp@v0.0.0-20190404075320-75d898a42a94/connection.go:551

1 @ 0x438936 0x448392 0xf73aa5 0x468741
#   0xf73aa4    github.com/assembla/cony.(*Client).Loop.func1+0x104 /var/go/go/pkg/mod/github.com/assembla/cony@v0.3.2/client.go:156

1 @ 0x438936 0x448392 0xf742b9 0x468741
#   0xf742b8    github.com/assembla/cony.(*Consumer).serve+0x1d8    /var/go/go/pkg/mod/github.com/assembla/cony@v0.3.2/consumer.go:83

1 @ 0x438936 0x449473 0x44944d 0x464745 0x474172 0xea750a 0x468741
#   0x464744    sync.runtime_Semacquire+0x24                    /usr/local/go/src/runtime/sema.go:56
#   0x474171    sync.(*WaitGroup).Wait+0x51                 /usr/local/go/src/sync/waitgroup.go:136
#   0xea7509    github.com/eclipse/paho%2emqtt%2egolang.startComms.func3+0x29   /var/go/go/pkg/mod/github.com/eclipse/paho.mqtt.golang@v1.4.2/net.go:438

1 @ 0x438936 0x449473 0x44944d 0x464865 0x4726e5 0x81ce1b 0x81cdf1 0x81c4c5 0x81bfde 0x468741
#   0x464864    sync.runtime_SemacquireMutex+0x24               /usr/local/go/src/runtime/sema.go:71
#   0x4726e4    sync.(*Mutex).lockSlow+0x164                    /usr/local/go/src/sync/mutex.go:162
#   0x81ce1a    sync.(*Mutex).Lock+0x9a                     /usr/local/go/src/sync/mutex.go:81
#   0x81cdf0    net/http.(*http2clientStream).encodeAndWriteHeaders+0x70    /usr/local/go/src/net/http/h2_bundle.go:8064
#   0x81c4c4    net/http.(*http2clientStream).writeRequest+0x4a4        /usr/local/go/src/net/http/h2_bundle.go:7986
#   0x81bfdd    net/http.(*http2clientStream).doRequest+0x1d            /usr/local/go/src/net/http/h2_bundle.go:7904

1 @ 0x462bc5 0xee1f75 0xee1d8d 0xedeccb 0xeed9a5 0xf38167 0x83b78f 0x83d129 0xefa162 0x83ed7b 0x83a237 0x468741
#   0x462bc4    runtime/pprof.runtime_goroutineProfileWithLabels+0x24       /usr/local/go/src/runtime/mprof.go:753
#   0xee1f74    runtime/pprof.writeRuntimeProfile+0xb4              /usr/local/go/src/runtime/pprof/pprof.go:725
#   0xee1d8c    runtime/pprof.writeGoroutine+0x4c               /usr/local/go/src/runtime/pprof/pprof.go:685
#   0xedecca    runtime/pprof.(*Profile).WriteTo+0x14a              /usr/local/go/src/runtime/pprof/pprof.go:332
#   0xeed9a4    net/http/pprof.handler.ServeHTTP+0x4a4              /usr/local/go/src/net/http/pprof/pprof.go:253
#   0xf38166    XXXXXX/app.pprofIndexOverride+0x206 XXXXXX/app/server.go:161
#   0x83b78e    net/http.HandlerFunc.ServeHTTP+0x2e             /usr/local/go/src/net/http/server.go:2084
#   0x83d128    net/http.(*ServeMux).ServeHTTP+0x148                /usr/local/go/src/net/http/server.go:2462
#   0xefa161    github.com/gorilla/mux.(*Router).ServeHTTP+0x201        /var/go/go/pkg/mod/github.com/gorilla/mux@v1.7.2/mux.go:212
#   0x83ed7a    net/http.serverHandler.ServeHTTP+0x43a              /usr/local/go/src/net/http/server.go:2916
#   0x83a236    net/http.(*conn).serve+0x5d6                    /usr/local/go/src/net/http/server.go:1966
dmitshur commented 1 year ago

CC @neild.

rhysh commented 1 year ago

Thanks for splitting this out. Can you clarify what specifically the problem is? It sounds like the number of goroutines is the bug for you ("I expected all go routines that the std library creates to handle the HTTP request to be cleaned up"), but looking at the goroutines stacks I wonder if the larger problem is that the Transport continues to schedule new requests onto the stuck http2 connection even after many requests on that connection have all timed out.


I see goroutines doing the following outbound http/2 work:

16 outbound http/2 connections that exist in the app (rooted at net/http.(*http2ClientConn).readLoop)

11 requests the app is trying to send over http/2 right now (in net/http.(*http2ClientConn).RoundTrip)

1258 + 12 trying to clean up requests that were scheduled onto the stuck connection (in net/http.(*http2clientStream).cleanupWriteRequest)

9+2 in net/http.(*http2clientStream).doRequest calling net/http.(*http2clientStream).writeRequest, with 9 on line 7929 and 2 on line 8042. I think these correspond to the 11 goroutines in net/http.(*http2ClientConn).RoundTrip.

1 more in net/http.(*http2clientStream).doRequest, net/http.(*http2clientStream).writeRequest (line 7986), net/http.(*http2clientStream).encodeAndWriteHeaders, sync.(*Mutex).Lock on cc.wmu. This request holds cc.reqHeaderMu, but does not hold cc.wmu.

1 goroutine trying to write a frame to an http/2 client connection (net/http.(*http2clientStream).doRequest, net/http.(*http2clientStream).cleanupWriteRequest, net/http.(*http2ClientConn).writeStreamReset, bufio.(*Writer).Flush, ..., crypto/tls.(*Conn).Write, ..., internal/poll.(*pollDesc).waitWrite. This request holds cc.wmu.

The following outbound http/1.1 work (I think unrelated, but helpful to classify): 12 outbound http/1.1 connections that exist in the app (rooted at net/http.(*persistConn).writeLoop) 6+3 idle http/1.1 client connections, 6 using https and 3 using http (net/http.(*persistConn).readLoop leading to net/http.(*persistConn).Read) 3 active http/1.1 client connections, reading response body (net/http.(*body).Read leading to net/http.(*persistConn).Read), corresponding to another 3 goroutines in net/http.(*persistConn).readLoop without any Read calls on their stack. 2 outbound http/1.1 requests still waiting for response headers (net/http.(*persistConn).roundTrip).

The following inbound http/2 work (I think unrelated, but helpful to classify): 10 inbound http/2 connections for gRPC, google.golang.org/grpc/internal/transport.(*http2Client).reader


As I understand it:

  1. The request whose goroutine holds cc.wmu and is blocked in crypto/tls.(*Conn).Write remains blocked for a long time because there's a problem with the underlying TCP connection (maybe the network is dropping all packets, or all packets for that peer, or all packets for that five-tuple). This request has been waiting since the kernel's write buffer for the TCP connection filled up / writes to the file descriptor started blocking.
  2. The request whose goroutine holds cc.reqHeaderMu cannot proceed because it's trying to acquire cc.wmu. This request has been waiting since shortly after writes to the file descriptor started blocking.
  3. The 2 goroutines in writeRequest on line 8042 have managed to use cc.reqHeaderMu before their context values timed out. That's interesting .. maybe the TCP connection isn't completely broken, but is instead very, very slow? Or is it possible that these 2 requests didn't have a timeout set, and have been around since the start of the problem?
  4. The 9 goroutines in writeRequest on line 7929 are waiting for cc.reqHeaderMu (or for their context values to expire), but they can't get it because the http/2 connection is busy (items 2 and 1). The requests for these goroutines are very recent / current. The Transport chose this http/2 connection as the right one to use for putting those requests on the wire.
  5. The 1258 + 12 goroutines in cleanupWriteRequest (I haven't investigated the difference between them) are each for a request that was scheduled to a (the?) slow connection and have not yet had a chance to use cc.wmu. That gives an indication of how long the crypto/tls.(*Conn).Write call has been blocked (a long time).
stampy88 commented 1 year ago

Hi @rhysh , well, the bug isn't necessarily the go routines themselves, but rather the potential for these resources, e.g. go routines, memory, etc. to keep growing because of the problem you stated.

adg commented 1 year ago

I think I'm seeing a manifestation of the same problem here. HTTP/2 requests where the context deadline has been reached, but the requests remain outstanding, with each goroutine blocked trying to acquire a mutex in abortStream. In this case, the upstream server is not responding. Here are the active goroutines:

goroutine profile: total 45
7 @ 0x43dd76 0x44f4af 0x44f486 0x46dd06 0x47c925 0x732e46 0x732e14 0x737d0e 0x733ae5 0x73363b 0x7678cb 0x74e7f9 0x70a9d7 0x70a25b 0x70c55b 0x101cedd 0x101cecc 0x101d6fd 0x101d41a 0x101f2d3 0x101e28f 0x10484b4 0x1046ae5 0x1038225 0x1042345 0x96a258 0x103bf23 0x950039 0xb79d43 0x950039 0xae3659 0x94fdcf
#   0x46dd05    sync.runtime_SemacquireMutex+0x25                           /usr/local/go/src/runtime/sema.go:77
#   0x47c924    sync.(*Mutex).lockSlow+0x164                                /usr/local/go/src/sync/mutex.go:171
#   0x732e45    sync.(*Mutex).Lock+0x65                                 /usr/local/go/src/sync/mutex.go:90
#   0x732e13    net/http.(*http2clientStream).abortStream+0x33                      /usr/local/go/src/net/http/h2_bundle.go:7373
#   0x737d0d    net/http.(*http2ClientConn).RoundTrip+0x56d                     /usr/local/go/src/net/http/h2_bundle.go:8248
#   0x733ae4    net/http.(*http2Transport).RoundTripOpt+0x1c4                       /usr/local/go/src/net/http/h2_bundle.go:7523
#   0x73363a    net/http.(*http2Transport).RoundTrip+0x1a                       /usr/local/go/src/net/http/h2_bundle.go:7475
#   0x7678ca    net/http.(*Transport).roundTrip+0x7ea                           /usr/local/go/src/net/http/transport.go:601
#   0x74e7f8    net/http.(*Transport).RoundTrip+0x18                            /usr/local/go/src/net/http/roundtrip.go:17
#   0x70a9d6    net/http.send+0x5f6                                 /usr/local/go/src/net/http/client.go:252
#   0x70a25a    net/http.(*Client).send+0x9a                                /usr/local/go/src/net/http/client.go:176
#   0x70c55a    net/http.(*Client).do+0x8fa                             /usr/local/go/src/net/http/client.go:716
#   0x101cedc   net/http.(*Client).Do+0x11c                             /usr/local/go/src/net/http/client.go:582
#   0x101cecb   redacted ...
#   0x101d6fc   redacted ...
#   0x101d419   redacted ...
#   0x101f2d2   redacted ...
#   0x101e28e   redacted ...
#   0x10484b3   redacted ...
#   0x1046ae4   redacted ...
#   0x1038224   redacted ...
#   0x1042344   redacted ...
#   0x96a257    redacted ...
#   0x103bf22   redacted ...
#   0x950038    google.golang.org/grpc.getChainUnaryHandler.func1+0xb8                  /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1164
#   0xb79d42    redacted ...
#   0x950038    google.golang.org/grpc.getChainUnaryHandler.func1+0xb8                  /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1164
#   0xae3658    redacted ...
#   0x94fdce    google.golang.org/grpc.chainUnaryInterceptors.func1+0x8e                /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1155

7 @ 0x43dd76 0x44f4af 0x44f486 0x46dd06 0x47c925 0x73944a 0x739413 0x7380ee 0x471f81
#   0x46dd05    sync.runtime_SemacquireMutex+0x25           /usr/local/go/src/runtime/sema.go:77
#   0x47c924    sync.(*Mutex).lockSlow+0x164                /usr/local/go/src/sync/mutex.go:171
#   0x739449    sync.(*Mutex).Lock+0x89                 /usr/local/go/src/sync/mutex.go:90
#   0x739412    net/http.(*http2clientStream).cleanupWriteRequest+0x52  /usr/local/go/src/net/http/h2_bundle.go:8476
#   0x7380ed    net/http.(*http2clientStream).doRequest+0x2d        /usr/local/go/src/net/http/h2_bundle.go:8262

7 @ 0x43dd76 0x44f4af 0x44f486 0x46dd06 0x47c925 0x73ce8a 0x73ce77 0x43aa33 0x45223d 0x45220d 0x72235c 0x73fe34 0x73d874 0x73cacf 0x471f81
#   0x46dd05    sync.runtime_SemacquireMutex+0x25           /usr/local/go/src/runtime/sema.go:77
#   0x47c924    sync.(*Mutex).lockSlow+0x164                /usr/local/go/src/sync/mutex.go:171
#   0x73ce89    sync.(*Mutex).Lock+0x149                /usr/local/go/src/sync/mutex.go:90
#   0x73ce76    net/http.(*http2clientConnReadLoop).cleanup+0x136   /usr/local/go/src/net/http/h2_bundle.go:9125
#   0x43aa32    runtime.gopanic+0x212                   /usr/local/go/src/runtime/panic.go:884
#   0x45223c    runtime.panicmem+0x37c                  /usr/local/go/src/runtime/panic.go:260
#   0x45220c    runtime.sigpanic+0x34c                  /usr/local/go/src/runtime/signal_unix.go:841
#   0x72235b    net/http.(*http2pipe).Write+0x17b           /usr/local/go/src/net/http/h2_bundle.go:3710
#   0x73fe33    net/http.(*http2clientConnReadLoop).processData+0x253   /usr/local/go/src/net/http/h2_bundle.go:9642
#   0x73d873    net/http.(*http2clientConnReadLoop).run+0x3f3       /usr/local/go/src/net/http/h2_bundle.go:9221
#   0x73cace    net/http.(*http2ClientConn).readLoop+0x6e       /usr/local/go/src/net/http/h2_bundle.go:9082

2 @ 0x43dd76 0x436397 0x46c229 0x4ebff2 0x4ed3d9 0x4ed3c7 0x649189 0x65ac65 0x6a26fd 0x4ac3f8 0x6a28e5 0x69fdd6 0x6a5ccf 0x6a5cd0 0x56cfbb 0x4a35ba 0x8f6c4e 0x8f6c08 0x8f7495 0x920574 0x94e409 0x94dc86 0x471f81
#   0x46c228    internal/poll.runtime_pollWait+0x88                     /usr/local/go/src/runtime/netpoll.go:306
#   0x4ebff1    internal/poll.(*pollDesc).wait+0x31                     /usr/local/go/src/internal/poll/fd_poll_runtime.go:84
#   0x4ed3d8    internal/poll.(*pollDesc).waitRead+0x298                    /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
#   0x4ed3c6    internal/poll.(*FD).Read+0x286                          /usr/local/go/src/internal/poll/fd_unix.go:167
#   0x649188    net.(*netFD).Read+0x28                              /usr/local/go/src/net/fd_posix.go:55
#   0x65ac64    net.(*conn).Read+0x44                               /usr/local/go/src/net/net.go:183
#   0x6a26fc    crypto/tls.(*atLeastReader).Read+0x3c                       /usr/local/go/src/crypto/tls/conn.go:788
#   0x4ac3f7    bytes.(*Buffer).ReadFrom+0x97                           /usr/local/go/src/bytes/buffer.go:202
#   0x6a28e4    crypto/tls.(*Conn).readFromUntil+0xe4                       /usr/local/go/src/crypto/tls/conn.go:810
#   0x69fdd5    crypto/tls.(*Conn).readRecordOrCCS+0x115                    /usr/local/go/src/crypto/tls/conn.go:617
#   0x6a5cce    crypto/tls.(*Conn).readRecord+0x16e                     /usr/local/go/src/crypto/tls/conn.go:583
#   0x6a5ccf    crypto/tls.(*Conn).Read+0x16f                           /usr/local/go/src/crypto/tls/conn.go:1316
#   0x56cfba    bufio.(*Reader).Read+0x1ba                          /usr/local/go/src/bufio/bufio.go:237
#   0x4a35b9    io.ReadAtLeast+0x99                             /usr/local/go/src/io/io.go:332
#   0x8f6c4d    io.ReadFull+0x6d                                /usr/local/go/src/io/io.go:351
#   0x8f6c07    golang.org/x/net/http2.readFrameHeader+0x27                 /builder/home/go/pkg/mod/golang.org/x/net@v0.10.0/http2/frame.go:237
#   0x8f7494    golang.org/x/net/http2.(*Framer).ReadFrame+0x94                 /builder/home/go/pkg/mod/golang.org/x/net@v0.10.0/http2/frame.go:498
#   0x920573    google.golang.org/grpc/internal/transport.(*http2Server).HandleStreams+0x173    /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/internal/transport/http2_server.go:637
#   0x94e408    google.golang.org/grpc.(*Server).serveStreams+0x188             /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:950
#   0x94dc85    google.golang.org/grpc.(*Server).handleRawConn.func1+0x45           /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:892

2 @ 0x43dd76 0x44e3be 0x9052f5 0x905a51 0x91d16e 0x471f81
#   0x9052f4    google.golang.org/grpc/internal/transport.(*controlBuffer).get+0x114    /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/internal/transport/controlbuf.go:417
#   0x905a50    google.golang.org/grpc/internal/transport.(*loopyWriter).run+0x90   /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/internal/transport/controlbuf.go:549
#   0x91d16d    google.golang.org/grpc/internal/transport.NewServerTransport.func2+0xcd /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/internal/transport/http2_server.go:336

2 @ 0x43dd76 0x44e3be 0x924993 0x471f81
#   0x924992    google.golang.org/grpc/internal/transport.(*http2Server).keepalive+0x232    /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/internal/transport/http2_server.go:1150

1 @ 0x40e794 0x46e40f 0xff9319 0x471f81
#   0x46e40e    os/signal.signal_recv+0x2e  /usr/local/go/src/runtime/sigqueue.go:152
#   0xff9318    os/signal.loop+0x18     /usr/local/go/src/os/signal/signal_unix.go:23

1 @ 0x432a76 0x46bd85 0xb15ab5 0xb158cd 0xb1264b 0xb72c45 0xb73793 0x75768f 0x759029 0x75a7d6 0x7561b2 0x471f81
#   0x46bd84    runtime/pprof.runtime_goroutineProfileWithLabels+0x24   /usr/local/go/src/runtime/mprof.go:844
#   0xb15ab4    runtime/pprof.writeRuntimeProfile+0xb4          /usr/local/go/src/runtime/pprof/pprof.go:734
#   0xb158cc    runtime/pprof.writeGoroutine+0x4c           /usr/local/go/src/runtime/pprof/pprof.go:694
#   0xb1264a    runtime/pprof.(*Profile).WriteTo+0x14a          /usr/local/go/src/runtime/pprof/pprof.go:329
#   0xb72c44    net/http/pprof.handler.ServeHTTP+0x4a4          /usr/local/go/src/net/http/pprof/pprof.go:259
#   0xb73792    net/http/pprof.Index+0xf2               /usr/local/go/src/net/http/pprof/pprof.go:376
#   0x75768e    net/http.HandlerFunc.ServeHTTP+0x2e         /usr/local/go/src/net/http/server.go:2122
#   0x759028    net/http.(*ServeMux).ServeHTTP+0x148            /usr/local/go/src/net/http/server.go:2500
#   0x75a7d5    net/http.serverHandler.ServeHTTP+0x315          /usr/local/go/src/net/http/server.go:2936
#   0x7561b1    net/http.(*conn).serve+0x611                /usr/local/go/src/net/http/server.go:1995

1 @ 0x43dd76 0x40901d 0x408b18 0xb76f13 0x103ba4b 0xb57d73 0x471f81
#   0xb76f12    redacted ...
#   0x103ba4a   redacted ...
#   0xb57d72    redacted ...

1 @ 0x43dd76 0x40901d 0x408b18 0xb770c9 0x471f81
#   0xb770c8    redacted ...

1 @ 0x43dd76 0x40901d 0x408b18 0xff97f5 0x471f81
#   0xff97f4    redacted ...

1 @ 0x43dd76 0x40901d 0x408b58 0xb55bbc 0x471f81
#   0xb55bbb    redacted ...

1 @ 0x43dd76 0x436397 0x46c229 0x4ebff2 0x4ed3d9 0x4ed3c7 0x649189 0x65ac65 0x750351 0x56c9df 0x56cb3d 0x75631c 0x471f81
#   0x46c228    internal/poll.runtime_pollWait+0x88     /usr/local/go/src/runtime/netpoll.go:306
#   0x4ebff1    internal/poll.(*pollDesc).wait+0x31     /usr/local/go/src/internal/poll/fd_poll_runtime.go:84
#   0x4ed3d8    internal/poll.(*pollDesc).waitRead+0x298    /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
#   0x4ed3c6    internal/poll.(*FD).Read+0x286          /usr/local/go/src/internal/poll/fd_unix.go:167
#   0x649188    net.(*netFD).Read+0x28              /usr/local/go/src/net/fd_posix.go:55
#   0x65ac64    net.(*conn).Read+0x44               /usr/local/go/src/net/net.go:183
#   0x750350    net/http.(*connReader).Read+0x170       /usr/local/go/src/net/http/server.go:782
#   0x56c9de    bufio.(*Reader).fill+0xfe           /usr/local/go/src/bufio/bufio.go:106
#   0x56cb3c    bufio.(*Reader).Peek+0x5c           /usr/local/go/src/bufio/bufio.go:144
#   0x75631b    net/http.(*conn).serve+0x77b            /usr/local/go/src/net/http/server.go:2030

1 @ 0x43dd76 0x436397 0x46c229 0x4ebff2 0x4f18fd 0x4f18eb 0x64b315 0x663c25 0x662d1d 0x75ad45 0x1033f8a 0x1033f8b 0x471f81
#   0x46c228    internal/poll.runtime_pollWait+0x88     /usr/local/go/src/runtime/netpoll.go:306
#   0x4ebff1    internal/poll.(*pollDesc).wait+0x31     /usr/local/go/src/internal/poll/fd_poll_runtime.go:84
#   0x4f18fc    internal/poll.(*pollDesc).waitRead+0x2bc    /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
#   0x4f18ea    internal/poll.(*FD).Accept+0x2aa        /usr/local/go/src/internal/poll/fd_unix.go:614
#   0x64b314    net.(*netFD).accept+0x34            /usr/local/go/src/net/fd_unix.go:172
#   0x663c24    net.(*TCPListener).accept+0x24          /usr/local/go/src/net/tcpsock_posix.go:148
#   0x662d1c    net.(*TCPListener).Accept+0x3c          /usr/local/go/src/net/tcpsock.go:297
#   0x75ad44    net/http.(*Server).Serve+0x384          /usr/local/go/src/net/http/server.go:3059
#   0x1033f89   net/http.Serve+0x49             /usr/local/go/src/net/http/server.go:2581
#   0x1033f8a   redacted ...

1 @ 0x43dd76 0x436397 0x46c229 0x4ebff2 0x4f18fd 0x4f18eb 0x64b315 0x663c25 0x662d1d 0x94d2b5 0xb7704e 0x471f81
#   0x46c228    internal/poll.runtime_pollWait+0x88             /usr/local/go/src/runtime/netpoll.go:306
#   0x4ebff1    internal/poll.(*pollDesc).wait+0x31             /usr/local/go/src/internal/poll/fd_poll_runtime.go:84
#   0x4f18fc    internal/poll.(*pollDesc).waitRead+0x2bc            /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
#   0x4f18ea    internal/poll.(*FD).Accept+0x2aa                /usr/local/go/src/internal/poll/fd_unix.go:614
#   0x64b314    net.(*netFD).accept+0x34                    /usr/local/go/src/net/fd_unix.go:172
#   0x663c24    net.(*TCPListener).accept+0x24                  /usr/local/go/src/net/tcpsock_posix.go:148
#   0x662d1c    net.(*TCPListener).Accept+0x3c                  /usr/local/go/src/net/tcpsock.go:297
#   0x94d2b4    google.golang.org/grpc.(*Server).Serve+0x474            /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:824
#   0xb7704d    redacted ...

1 @ 0x43dd76 0x44e3be 0x103a5c5 0x471f81
#   0x103a5c4   redacted ...

1 @ 0x43dd76 0x44e3be 0x51964d 0x471f81
#   0x51964c    database/sql.(*DB).connectionOpener+0x8c    /usr/local/go/src/database/sql/sql.go:1218

1 @ 0x43dd76 0x44e3be 0xb2540d 0x471f81
#   0xb2540c    go.opencensus.io/stats/view.(*worker).start+0xac    /builder/home/go/pkg/mod/go.opencensus.io@v0.24.0/stats/view/worker.go:292

1 @ 0x43dd76 0x44e3be 0xb33c89 0xb32310 0x103bcb1 0x953379 0xb7a2d2 0x953379 0xae3ac5 0x95310f 0x954723 0x955f50 0x94e7d8 0x471f81
#   0xb33c88    google.golang.org/grpc/health.(*Server).Watch+0x2c8         /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/health/server.go:92
#   0xb3230f    google.golang.org/grpc/health/grpc_health_v1._Health_Watch_Handler+0xcf /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/health/grpc_health_v1/health_grpc.pb.go:187
#   0x103bcb0   redacted ...
#   0x953378    google.golang.org/grpc.getChainStreamHandler.func1+0xb8         /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1491
#   0xb7a2d1    redacted ...
#   0x953378    google.golang.org/grpc.getChainStreamHandler.func1+0xb8         /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1491
#   0xae3ac4    redacted ...
#   0x95310e    google.golang.org/grpc.chainStreamInterceptors.func1+0x8e       /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1482
#   0x954722    google.golang.org/grpc.(*Server).processStreamingRPC+0x1362     /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1646
#   0x955f4f    google.golang.org/grpc.(*Server).handleStream+0x9ef         /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1726
#   0x94e7d7    google.golang.org/grpc.(*Server).serveStreams.func1.2+0x97      /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:966

1 @ 0x43dd76 0x44e3be 0xb38c65 0x471f81
#   0xb38c64    redacted ...

1 @ 0x43dd76 0x44f4af 0x44f486 0x46dbe7 0x47e42b 0xb57c6f 0x1033d2b 0x43d947 0x471f81
#   0x46dbe6    sync.runtime_Semacquire+0x26                    /usr/local/go/src/runtime/sema.go:62
#   0x47e42a    sync.(*WaitGroup).Wait+0x4a                 /usr/local/go/src/sync/waitgroup.go:116
#   0xb57c6e    redacted ...
#   0x1033d2a   redacted ...
#   0x43d946    runtime.main+0x206                      /usr/local/go/src/runtime/proc.go:250

1 @ 0x43dd76 0x46ed75 0xb54a32 0x471f81
#   0x46ed74    time.Sleep+0x134                        /usr/local/go/src/runtime/time.go:195
#   0xb54a31    redacted ...

1 @ 0x471f81

1 @ 0x488cd6 0x8970b8 0xb3671d 0xb359d5 0x471f81
#   0x488cd5    syscall.Syscall6+0x35                       /usr/local/go/src/syscall/syscall_linux.go:91
#   0x8970b7    golang.org/x/sys/unix.EpollWait+0x57                /builder/home/go/pkg/mod/golang.org/x/sys@v0.8.0/unix/zsyscall_linux_amd64.go:56
#   0xb3671c    github.com/fsnotify/fsnotify.(*fdPoller).wait+0x7c      /builder/home/go/pkg/mod/github.com/fsnotify/fsnotify@v1.5.1/inotify_poller.go:87
#   0xb359d4    github.com/fsnotify/fsnotify.(*Watcher).readEvents+0x274    /builder/home/go/pkg/mod/github.com/fsnotify/fsnotify@v1.5.1/inotify.go:193
ghost commented 10 months ago

How to fix😂

90wukai commented 8 months ago

@stampy88 Do you have any ideas for fix it?

stampy88 commented 4 months ago

I do not @90wukai. Was hoping @rhysh may.

rhysh commented 4 months ago

Hi @stampy88 , I use net/http less these days than I did in early 2023. I don't have a fix, and I don't have plans to make one.

Maintainer time for http/2 is scarce (see https://dev.golang.org/owners for each package's list). We'll do best if we can make a little of it go a long way. Here's what I think would help:

@stampy88 , you'd initially said you were working on a reproducer. I assume you would have said if you'd finished ... but otherwise, saying a bit about what you tried that didn't work (or which worked only partially) might help.

@90wukai (and the now-deleted Jan 21 poster), it sounds like you're affected too. Even if you don't have a lot of time for digging, it would help to say which Go version you use that has the problem ... and if you're using x/net/http2 directly then the version of that as well.

If you have more time, it could help to say a bit about the impact this has on your programs (on a scale of "kinda annoying every couple months" to "multi-hour system outage almost daily"), and some background on what your systems are "like". Especially if are there ways you suspect that your use of Go is "unusual", which might make this issue appear more frequently than in the general population of Go users.

@adg , thanks for the goroutine profile. I assume you run an up-to-date Go version, and the line numbers match go1.20. But I don't see any singleton goroutines with .../h2_bundle.go code on their stacks which might be holding the lock that the other 7+7+7=21 goroutines are trying to acquire. And the panic in net/http.(*http2pipe).Write seems strange. Hmm.

stampy88 commented 3 months ago

Sorry @rhysh, I was unable to consistently reproduce it and have disabled HTTP 2 for my usage in this legacy app that was having the issue. It was so long ago, I don't recall what I did. I'll try and dig up the code I had that was the basis for my reproducer, but I have a bad feeling I don't have it anymore.