Open stampy88 opened 1 year ago
CC @neild.
Thanks for splitting this out. Can you clarify what specifically the problem is? It sounds like the number of goroutines is the bug for you ("I expected all go routines that the std library creates to handle the HTTP request to be cleaned up"), but looking at the goroutines stacks I wonder if the larger problem is that the Transport continues to schedule new requests onto the stuck http2 connection even after many requests on that connection have all timed out.
I see goroutines doing the following outbound http/2 work:
16 outbound http/2 connections that exist in the app (rooted at net/http.(*http2ClientConn).readLoop
)
11 requests the app is trying to send over http/2 right now (in net/http.(*http2ClientConn).RoundTrip
)
1258 + 12 trying to clean up requests that were scheduled onto the stuck connection (in net/http.(*http2clientStream).cleanupWriteRequest
)
9+2 in net/http.(*http2clientStream).doRequest
calling net/http.(*http2clientStream).writeRequest
, with 9 on line 7929 and 2 on line 8042. I think these correspond to the 11 goroutines in net/http.(*http2ClientConn).RoundTrip
.
1 more in net/http.(*http2clientStream).doRequest
, net/http.(*http2clientStream).writeRequest
(line 7986), net/http.(*http2clientStream).encodeAndWriteHeaders
, sync.(*Mutex).Lock
on cc.wmu
. This request holds cc.reqHeaderMu
, but does not hold cc.wmu
.
1 goroutine trying to write a frame to an http/2 client connection (net/http.(*http2clientStream).doRequest
, net/http.(*http2clientStream).cleanupWriteRequest
, net/http.(*http2ClientConn).writeStreamReset
, bufio.(*Writer).Flush
, ..., crypto/tls.(*Conn).Write
, ..., internal/poll.(*pollDesc).waitWrite
. This request holds cc.wmu
.
The following outbound http/1.1 work (I think unrelated, but helpful to classify):
12 outbound http/1.1 connections that exist in the app (rooted at net/http.(*persistConn).writeLoop
)
6+3 idle http/1.1 client connections, 6 using https and 3 using http (net/http.(*persistConn).readLoop
leading to net/http.(*persistConn).Read
)
3 active http/1.1 client connections, reading response body (net/http.(*body).Read
leading to net/http.(*persistConn).Read
), corresponding to another 3 goroutines in net/http.(*persistConn).readLoop
without any Read calls on their stack.
2 outbound http/1.1 requests still waiting for response headers (net/http.(*persistConn).roundTrip
).
The following inbound http/2 work (I think unrelated, but helpful to classify):
10 inbound http/2 connections for gRPC, google.golang.org/grpc/internal/transport.(*http2Client).reader
As I understand it:
cc.wmu
and is blocked in crypto/tls.(*Conn).Write
remains blocked for a long time because there's a problem with the underlying TCP connection (maybe the network is dropping all packets, or all packets for that peer, or all packets for that five-tuple). This request has been waiting since the kernel's write buffer for the TCP connection filled up / writes to the file descriptor started blocking.cc.reqHeaderMu
cannot proceed because it's trying to acquire cc.wmu
. This request has been waiting since shortly after writes to the file descriptor started blocking.writeRequest
on line 8042 have managed to use cc.reqHeaderMu
before their context values timed out. That's interesting .. maybe the TCP connection isn't completely broken, but is instead very, very slow? Or is it possible that these 2 requests didn't have a timeout set, and have been around since the start of the problem?writeRequest
on line 7929 are waiting for cc.reqHeaderMu
(or for their context values to expire), but they can't get it because the http/2 connection is busy (items 2 and 1). The requests for these goroutines are very recent / current. The Transport chose this http/2 connection as the right one to use for putting those requests on the wire.cleanupWriteRequest
(I haven't investigated the difference between them) are each for a request that was scheduled to a (the?) slow connection and have not yet had a chance to use cc.wmu
. That gives an indication of how long the crypto/tls.(*Conn).Write
call has been blocked (a long time).Hi @rhysh , well, the bug isn't necessarily the go routines themselves, but rather the potential for these resources, e.g. go routines, memory, etc. to keep growing because of the problem you stated.
I think I'm seeing a manifestation of the same problem here. HTTP/2 requests where the context deadline has been reached, but the requests remain outstanding, with each goroutine blocked trying to acquire a mutex in abortStream. In this case, the upstream server is not responding. Here are the active goroutines:
goroutine profile: total 45
7 @ 0x43dd76 0x44f4af 0x44f486 0x46dd06 0x47c925 0x732e46 0x732e14 0x737d0e 0x733ae5 0x73363b 0x7678cb 0x74e7f9 0x70a9d7 0x70a25b 0x70c55b 0x101cedd 0x101cecc 0x101d6fd 0x101d41a 0x101f2d3 0x101e28f 0x10484b4 0x1046ae5 0x1038225 0x1042345 0x96a258 0x103bf23 0x950039 0xb79d43 0x950039 0xae3659 0x94fdcf
# 0x46dd05 sync.runtime_SemacquireMutex+0x25 /usr/local/go/src/runtime/sema.go:77
# 0x47c924 sync.(*Mutex).lockSlow+0x164 /usr/local/go/src/sync/mutex.go:171
# 0x732e45 sync.(*Mutex).Lock+0x65 /usr/local/go/src/sync/mutex.go:90
# 0x732e13 net/http.(*http2clientStream).abortStream+0x33 /usr/local/go/src/net/http/h2_bundle.go:7373
# 0x737d0d net/http.(*http2ClientConn).RoundTrip+0x56d /usr/local/go/src/net/http/h2_bundle.go:8248
# 0x733ae4 net/http.(*http2Transport).RoundTripOpt+0x1c4 /usr/local/go/src/net/http/h2_bundle.go:7523
# 0x73363a net/http.(*http2Transport).RoundTrip+0x1a /usr/local/go/src/net/http/h2_bundle.go:7475
# 0x7678ca net/http.(*Transport).roundTrip+0x7ea /usr/local/go/src/net/http/transport.go:601
# 0x74e7f8 net/http.(*Transport).RoundTrip+0x18 /usr/local/go/src/net/http/roundtrip.go:17
# 0x70a9d6 net/http.send+0x5f6 /usr/local/go/src/net/http/client.go:252
# 0x70a25a net/http.(*Client).send+0x9a /usr/local/go/src/net/http/client.go:176
# 0x70c55a net/http.(*Client).do+0x8fa /usr/local/go/src/net/http/client.go:716
# 0x101cedc net/http.(*Client).Do+0x11c /usr/local/go/src/net/http/client.go:582
# 0x101cecb redacted ...
# 0x101d6fc redacted ...
# 0x101d419 redacted ...
# 0x101f2d2 redacted ...
# 0x101e28e redacted ...
# 0x10484b3 redacted ...
# 0x1046ae4 redacted ...
# 0x1038224 redacted ...
# 0x1042344 redacted ...
# 0x96a257 redacted ...
# 0x103bf22 redacted ...
# 0x950038 google.golang.org/grpc.getChainUnaryHandler.func1+0xb8 /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1164
# 0xb79d42 redacted ...
# 0x950038 google.golang.org/grpc.getChainUnaryHandler.func1+0xb8 /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1164
# 0xae3658 redacted ...
# 0x94fdce google.golang.org/grpc.chainUnaryInterceptors.func1+0x8e /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1155
7 @ 0x43dd76 0x44f4af 0x44f486 0x46dd06 0x47c925 0x73944a 0x739413 0x7380ee 0x471f81
# 0x46dd05 sync.runtime_SemacquireMutex+0x25 /usr/local/go/src/runtime/sema.go:77
# 0x47c924 sync.(*Mutex).lockSlow+0x164 /usr/local/go/src/sync/mutex.go:171
# 0x739449 sync.(*Mutex).Lock+0x89 /usr/local/go/src/sync/mutex.go:90
# 0x739412 net/http.(*http2clientStream).cleanupWriteRequest+0x52 /usr/local/go/src/net/http/h2_bundle.go:8476
# 0x7380ed net/http.(*http2clientStream).doRequest+0x2d /usr/local/go/src/net/http/h2_bundle.go:8262
7 @ 0x43dd76 0x44f4af 0x44f486 0x46dd06 0x47c925 0x73ce8a 0x73ce77 0x43aa33 0x45223d 0x45220d 0x72235c 0x73fe34 0x73d874 0x73cacf 0x471f81
# 0x46dd05 sync.runtime_SemacquireMutex+0x25 /usr/local/go/src/runtime/sema.go:77
# 0x47c924 sync.(*Mutex).lockSlow+0x164 /usr/local/go/src/sync/mutex.go:171
# 0x73ce89 sync.(*Mutex).Lock+0x149 /usr/local/go/src/sync/mutex.go:90
# 0x73ce76 net/http.(*http2clientConnReadLoop).cleanup+0x136 /usr/local/go/src/net/http/h2_bundle.go:9125
# 0x43aa32 runtime.gopanic+0x212 /usr/local/go/src/runtime/panic.go:884
# 0x45223c runtime.panicmem+0x37c /usr/local/go/src/runtime/panic.go:260
# 0x45220c runtime.sigpanic+0x34c /usr/local/go/src/runtime/signal_unix.go:841
# 0x72235b net/http.(*http2pipe).Write+0x17b /usr/local/go/src/net/http/h2_bundle.go:3710
# 0x73fe33 net/http.(*http2clientConnReadLoop).processData+0x253 /usr/local/go/src/net/http/h2_bundle.go:9642
# 0x73d873 net/http.(*http2clientConnReadLoop).run+0x3f3 /usr/local/go/src/net/http/h2_bundle.go:9221
# 0x73cace net/http.(*http2ClientConn).readLoop+0x6e /usr/local/go/src/net/http/h2_bundle.go:9082
2 @ 0x43dd76 0x436397 0x46c229 0x4ebff2 0x4ed3d9 0x4ed3c7 0x649189 0x65ac65 0x6a26fd 0x4ac3f8 0x6a28e5 0x69fdd6 0x6a5ccf 0x6a5cd0 0x56cfbb 0x4a35ba 0x8f6c4e 0x8f6c08 0x8f7495 0x920574 0x94e409 0x94dc86 0x471f81
# 0x46c228 internal/poll.runtime_pollWait+0x88 /usr/local/go/src/runtime/netpoll.go:306
# 0x4ebff1 internal/poll.(*pollDesc).wait+0x31 /usr/local/go/src/internal/poll/fd_poll_runtime.go:84
# 0x4ed3d8 internal/poll.(*pollDesc).waitRead+0x298 /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
# 0x4ed3c6 internal/poll.(*FD).Read+0x286 /usr/local/go/src/internal/poll/fd_unix.go:167
# 0x649188 net.(*netFD).Read+0x28 /usr/local/go/src/net/fd_posix.go:55
# 0x65ac64 net.(*conn).Read+0x44 /usr/local/go/src/net/net.go:183
# 0x6a26fc crypto/tls.(*atLeastReader).Read+0x3c /usr/local/go/src/crypto/tls/conn.go:788
# 0x4ac3f7 bytes.(*Buffer).ReadFrom+0x97 /usr/local/go/src/bytes/buffer.go:202
# 0x6a28e4 crypto/tls.(*Conn).readFromUntil+0xe4 /usr/local/go/src/crypto/tls/conn.go:810
# 0x69fdd5 crypto/tls.(*Conn).readRecordOrCCS+0x115 /usr/local/go/src/crypto/tls/conn.go:617
# 0x6a5cce crypto/tls.(*Conn).readRecord+0x16e /usr/local/go/src/crypto/tls/conn.go:583
# 0x6a5ccf crypto/tls.(*Conn).Read+0x16f /usr/local/go/src/crypto/tls/conn.go:1316
# 0x56cfba bufio.(*Reader).Read+0x1ba /usr/local/go/src/bufio/bufio.go:237
# 0x4a35b9 io.ReadAtLeast+0x99 /usr/local/go/src/io/io.go:332
# 0x8f6c4d io.ReadFull+0x6d /usr/local/go/src/io/io.go:351
# 0x8f6c07 golang.org/x/net/http2.readFrameHeader+0x27 /builder/home/go/pkg/mod/golang.org/x/net@v0.10.0/http2/frame.go:237
# 0x8f7494 golang.org/x/net/http2.(*Framer).ReadFrame+0x94 /builder/home/go/pkg/mod/golang.org/x/net@v0.10.0/http2/frame.go:498
# 0x920573 google.golang.org/grpc/internal/transport.(*http2Server).HandleStreams+0x173 /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/internal/transport/http2_server.go:637
# 0x94e408 google.golang.org/grpc.(*Server).serveStreams+0x188 /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:950
# 0x94dc85 google.golang.org/grpc.(*Server).handleRawConn.func1+0x45 /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:892
2 @ 0x43dd76 0x44e3be 0x9052f5 0x905a51 0x91d16e 0x471f81
# 0x9052f4 google.golang.org/grpc/internal/transport.(*controlBuffer).get+0x114 /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/internal/transport/controlbuf.go:417
# 0x905a50 google.golang.org/grpc/internal/transport.(*loopyWriter).run+0x90 /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/internal/transport/controlbuf.go:549
# 0x91d16d google.golang.org/grpc/internal/transport.NewServerTransport.func2+0xcd /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/internal/transport/http2_server.go:336
2 @ 0x43dd76 0x44e3be 0x924993 0x471f81
# 0x924992 google.golang.org/grpc/internal/transport.(*http2Server).keepalive+0x232 /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/internal/transport/http2_server.go:1150
1 @ 0x40e794 0x46e40f 0xff9319 0x471f81
# 0x46e40e os/signal.signal_recv+0x2e /usr/local/go/src/runtime/sigqueue.go:152
# 0xff9318 os/signal.loop+0x18 /usr/local/go/src/os/signal/signal_unix.go:23
1 @ 0x432a76 0x46bd85 0xb15ab5 0xb158cd 0xb1264b 0xb72c45 0xb73793 0x75768f 0x759029 0x75a7d6 0x7561b2 0x471f81
# 0x46bd84 runtime/pprof.runtime_goroutineProfileWithLabels+0x24 /usr/local/go/src/runtime/mprof.go:844
# 0xb15ab4 runtime/pprof.writeRuntimeProfile+0xb4 /usr/local/go/src/runtime/pprof/pprof.go:734
# 0xb158cc runtime/pprof.writeGoroutine+0x4c /usr/local/go/src/runtime/pprof/pprof.go:694
# 0xb1264a runtime/pprof.(*Profile).WriteTo+0x14a /usr/local/go/src/runtime/pprof/pprof.go:329
# 0xb72c44 net/http/pprof.handler.ServeHTTP+0x4a4 /usr/local/go/src/net/http/pprof/pprof.go:259
# 0xb73792 net/http/pprof.Index+0xf2 /usr/local/go/src/net/http/pprof/pprof.go:376
# 0x75768e net/http.HandlerFunc.ServeHTTP+0x2e /usr/local/go/src/net/http/server.go:2122
# 0x759028 net/http.(*ServeMux).ServeHTTP+0x148 /usr/local/go/src/net/http/server.go:2500
# 0x75a7d5 net/http.serverHandler.ServeHTTP+0x315 /usr/local/go/src/net/http/server.go:2936
# 0x7561b1 net/http.(*conn).serve+0x611 /usr/local/go/src/net/http/server.go:1995
1 @ 0x43dd76 0x40901d 0x408b18 0xb76f13 0x103ba4b 0xb57d73 0x471f81
# 0xb76f12 redacted ...
# 0x103ba4a redacted ...
# 0xb57d72 redacted ...
1 @ 0x43dd76 0x40901d 0x408b18 0xb770c9 0x471f81
# 0xb770c8 redacted ...
1 @ 0x43dd76 0x40901d 0x408b18 0xff97f5 0x471f81
# 0xff97f4 redacted ...
1 @ 0x43dd76 0x40901d 0x408b58 0xb55bbc 0x471f81
# 0xb55bbb redacted ...
1 @ 0x43dd76 0x436397 0x46c229 0x4ebff2 0x4ed3d9 0x4ed3c7 0x649189 0x65ac65 0x750351 0x56c9df 0x56cb3d 0x75631c 0x471f81
# 0x46c228 internal/poll.runtime_pollWait+0x88 /usr/local/go/src/runtime/netpoll.go:306
# 0x4ebff1 internal/poll.(*pollDesc).wait+0x31 /usr/local/go/src/internal/poll/fd_poll_runtime.go:84
# 0x4ed3d8 internal/poll.(*pollDesc).waitRead+0x298 /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
# 0x4ed3c6 internal/poll.(*FD).Read+0x286 /usr/local/go/src/internal/poll/fd_unix.go:167
# 0x649188 net.(*netFD).Read+0x28 /usr/local/go/src/net/fd_posix.go:55
# 0x65ac64 net.(*conn).Read+0x44 /usr/local/go/src/net/net.go:183
# 0x750350 net/http.(*connReader).Read+0x170 /usr/local/go/src/net/http/server.go:782
# 0x56c9de bufio.(*Reader).fill+0xfe /usr/local/go/src/bufio/bufio.go:106
# 0x56cb3c bufio.(*Reader).Peek+0x5c /usr/local/go/src/bufio/bufio.go:144
# 0x75631b net/http.(*conn).serve+0x77b /usr/local/go/src/net/http/server.go:2030
1 @ 0x43dd76 0x436397 0x46c229 0x4ebff2 0x4f18fd 0x4f18eb 0x64b315 0x663c25 0x662d1d 0x75ad45 0x1033f8a 0x1033f8b 0x471f81
# 0x46c228 internal/poll.runtime_pollWait+0x88 /usr/local/go/src/runtime/netpoll.go:306
# 0x4ebff1 internal/poll.(*pollDesc).wait+0x31 /usr/local/go/src/internal/poll/fd_poll_runtime.go:84
# 0x4f18fc internal/poll.(*pollDesc).waitRead+0x2bc /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
# 0x4f18ea internal/poll.(*FD).Accept+0x2aa /usr/local/go/src/internal/poll/fd_unix.go:614
# 0x64b314 net.(*netFD).accept+0x34 /usr/local/go/src/net/fd_unix.go:172
# 0x663c24 net.(*TCPListener).accept+0x24 /usr/local/go/src/net/tcpsock_posix.go:148
# 0x662d1c net.(*TCPListener).Accept+0x3c /usr/local/go/src/net/tcpsock.go:297
# 0x75ad44 net/http.(*Server).Serve+0x384 /usr/local/go/src/net/http/server.go:3059
# 0x1033f89 net/http.Serve+0x49 /usr/local/go/src/net/http/server.go:2581
# 0x1033f8a redacted ...
1 @ 0x43dd76 0x436397 0x46c229 0x4ebff2 0x4f18fd 0x4f18eb 0x64b315 0x663c25 0x662d1d 0x94d2b5 0xb7704e 0x471f81
# 0x46c228 internal/poll.runtime_pollWait+0x88 /usr/local/go/src/runtime/netpoll.go:306
# 0x4ebff1 internal/poll.(*pollDesc).wait+0x31 /usr/local/go/src/internal/poll/fd_poll_runtime.go:84
# 0x4f18fc internal/poll.(*pollDesc).waitRead+0x2bc /usr/local/go/src/internal/poll/fd_poll_runtime.go:89
# 0x4f18ea internal/poll.(*FD).Accept+0x2aa /usr/local/go/src/internal/poll/fd_unix.go:614
# 0x64b314 net.(*netFD).accept+0x34 /usr/local/go/src/net/fd_unix.go:172
# 0x663c24 net.(*TCPListener).accept+0x24 /usr/local/go/src/net/tcpsock_posix.go:148
# 0x662d1c net.(*TCPListener).Accept+0x3c /usr/local/go/src/net/tcpsock.go:297
# 0x94d2b4 google.golang.org/grpc.(*Server).Serve+0x474 /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:824
# 0xb7704d redacted ...
1 @ 0x43dd76 0x44e3be 0x103a5c5 0x471f81
# 0x103a5c4 redacted ...
1 @ 0x43dd76 0x44e3be 0x51964d 0x471f81
# 0x51964c database/sql.(*DB).connectionOpener+0x8c /usr/local/go/src/database/sql/sql.go:1218
1 @ 0x43dd76 0x44e3be 0xb2540d 0x471f81
# 0xb2540c go.opencensus.io/stats/view.(*worker).start+0xac /builder/home/go/pkg/mod/go.opencensus.io@v0.24.0/stats/view/worker.go:292
1 @ 0x43dd76 0x44e3be 0xb33c89 0xb32310 0x103bcb1 0x953379 0xb7a2d2 0x953379 0xae3ac5 0x95310f 0x954723 0x955f50 0x94e7d8 0x471f81
# 0xb33c88 google.golang.org/grpc/health.(*Server).Watch+0x2c8 /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/health/server.go:92
# 0xb3230f google.golang.org/grpc/health/grpc_health_v1._Health_Watch_Handler+0xcf /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/health/grpc_health_v1/health_grpc.pb.go:187
# 0x103bcb0 redacted ...
# 0x953378 google.golang.org/grpc.getChainStreamHandler.func1+0xb8 /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1491
# 0xb7a2d1 redacted ...
# 0x953378 google.golang.org/grpc.getChainStreamHandler.func1+0xb8 /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1491
# 0xae3ac4 redacted ...
# 0x95310e google.golang.org/grpc.chainStreamInterceptors.func1+0x8e /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1482
# 0x954722 google.golang.org/grpc.(*Server).processStreamingRPC+0x1362 /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1646
# 0x955f4f google.golang.org/grpc.(*Server).handleStream+0x9ef /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:1726
# 0x94e7d7 google.golang.org/grpc.(*Server).serveStreams.func1.2+0x97 /builder/home/go/pkg/mod/google.golang.org/grpc@v1.54.0/server.go:966
1 @ 0x43dd76 0x44e3be 0xb38c65 0x471f81
# 0xb38c64 redacted ...
1 @ 0x43dd76 0x44f4af 0x44f486 0x46dbe7 0x47e42b 0xb57c6f 0x1033d2b 0x43d947 0x471f81
# 0x46dbe6 sync.runtime_Semacquire+0x26 /usr/local/go/src/runtime/sema.go:62
# 0x47e42a sync.(*WaitGroup).Wait+0x4a /usr/local/go/src/sync/waitgroup.go:116
# 0xb57c6e redacted ...
# 0x1033d2a redacted ...
# 0x43d946 runtime.main+0x206 /usr/local/go/src/runtime/proc.go:250
1 @ 0x43dd76 0x46ed75 0xb54a32 0x471f81
# 0x46ed74 time.Sleep+0x134 /usr/local/go/src/runtime/time.go:195
# 0xb54a31 redacted ...
1 @ 0x471f81
1 @ 0x488cd6 0x8970b8 0xb3671d 0xb359d5 0x471f81
# 0x488cd5 syscall.Syscall6+0x35 /usr/local/go/src/syscall/syscall_linux.go:91
# 0x8970b7 golang.org/x/sys/unix.EpollWait+0x57 /builder/home/go/pkg/mod/golang.org/x/sys@v0.8.0/unix/zsyscall_linux_amd64.go:56
# 0xb3671c github.com/fsnotify/fsnotify.(*fdPoller).wait+0x7c /builder/home/go/pkg/mod/github.com/fsnotify/fsnotify@v1.5.1/inotify_poller.go:87
# 0xb359d4 github.com/fsnotify/fsnotify.(*Watcher).readEvents+0x274 /builder/home/go/pkg/mod/github.com/fsnotify/fsnotify@v1.5.1/inotify.go:193
How to fix😂
@stampy88 Do you have any ideas for fix it?
I do not @90wukai. Was hoping @rhysh may.
Hi @stampy88 , I use net/http less these days than I did in early 2023. I don't have a fix, and I don't have plans to make one.
Maintainer time for http/2 is scarce (see https://dev.golang.org/owners for each package's list). We'll do best if we can make a little of it go a long way. Here's what I think would help:
@stampy88 , you'd initially said you were working on a reproducer. I assume you would have said if you'd finished ... but otherwise, saying a bit about what you tried that didn't work (or which worked only partially) might help.
@90wukai (and the now-deleted Jan 21 poster), it sounds like you're affected too. Even if you don't have a lot of time for digging, it would help to say which Go version you use that has the problem ... and if you're using x/net/http2 directly then the version of that as well.
If you have more time, it could help to say a bit about the impact this has on your programs (on a scale of "kinda annoying every couple months" to "multi-hour system outage almost daily"), and some background on what your systems are "like". Especially if are there ways you suspect that your use of Go is "unusual", which might make this issue appear more frequently than in the general population of Go users.
@adg , thanks for the goroutine profile. I assume you run an up-to-date Go version, and the line numbers match go1.20. But I don't see any singleton goroutines with .../h2_bundle.go code on their stacks which might be holding the lock that the other 7+7+7=21 goroutines are trying to acquire. And the panic in net/http.(*http2pipe).Write
seems strange. Hmm.
Sorry @rhysh, I was unable to consistently reproduce it and have disabled HTTP 2 for my usage in this legacy app that was having the issue. It was so long ago, I don't recall what I did. I'll try and dig up the code I had that was the basis for my reproducer, but I have a bad feeling I don't have it anymore.
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Have not attempted yet as it is hard to reproduce. I am trying to attempt to write a reproducer.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
The application does HTTP POSTs to configured clients endpoints whenever an event is received. The HTTP client is configured with a 5 second timeout so that long running requests don't block an event consumer too long.
What did you expect to see?
When a timeout occurs, I expected all go routines that the std library creates to handle the HTTP request to be cleaned up.
What did you see instead?
The server the app is connecting to supports HTTP 2. During periods where timeouts are occurring we can see the number of go routines steadily increase until the server it is trying to communicate to starts responding again. See stack traces below: