Open veshij opened 3 weeks ago
Based on the diagram you provided, the goroutines were reclaimed periodically. It seems expected behaviour from golang runtime perspectively.
Sorry, I needed to state that explicitly, "reclamation" happens on etcd cluster node restart.
Just to confirm, are you saying the goroutine count for each etcd member doesn't decrease until it's restarted?
etcdserver creates a recvLoop
goroutine for each watch stream, https://github.com/etcd-io/etcd/blob/bc5512a40bdc1fc08352789ea16695bb612914c9/server/etcdserver/api/v3rpc/watch.go#L193
Do the clients keep creating stream and never delete them? What's the client side behaviour?
Just to confirm, are you saying the goroutine count for each etcd member doesn't decrease until it's restarted?
that's correct.
etcdserver creates a
recvLoop
goroutine for each watch stream,Do the clients keep creating stream and never delete them? What's the client side behaviour?
Stream is created implicitly when new context metadata is passed. I didn't find a way to explicitly close stream on the client side later, but I briefly looked at client code - and it seems like it should close the stream once it has no active watchers (but there is a decent chance I misunderstood the implementation). To the best of my knowledge there should be no active watchers left in "inactive" streams on the client side.
I didn't find a way to explicitly close stream on the client side later
You can close the stream by either closing the client or closing the watch stream using wather.Close()
, otherwise, there will be more and more streams, according more and more recvLoop
goroutines. Please try it out and let me know the results.
The implementation closes old after new ones are created with new context metadata. Lets me try to implement a minimal repro for the issue.
Started working on implementation and found a metric:
# name: "etcd_debugging_mvcc_watch_stream_total"
# description: "Total number of watch streams."
# type: "gauge"
etcd_debugging_mvcc_watch_stream_total
number of watch streams (reported by server) remains constant, number of goroutines grows over time.
So far can't repro with a single client in an isolated environment. Typical trace looks like (which is expected):
1 @ 0x43fe6e 0x450345 0x90dd6a 0x90dc5b 0x90f22c 0x4a29f0 0x90f196 0x90f16d 0x9722c6 0x972b25 0x97331d 0x98a8ea 0xd8b6ad 0xa5d7a6 0xd9b4c2 0xd9af66 0x473181
# 0x90dd69 google.golang.org/grpc/internal/transport.(*recvBufferReader).read+0x89 external/org_golang_google_grpc/internal/transport/transport.go:181
# 0x90dc5a google.golang.org/grpc/internal/transport.(*recvBufferReader).Read+0x15a external/org_golang_google_grpc/internal/transport/transport.go:175
# 0x90f22b google.golang.org/grpc/internal/transport.(*transportReader).Read+0x2b external/org_golang_google_grpc/internal/transport/transport.go:525
# 0x4a29ef io.ReadAtLeast+0x8f GOROOT/src/io/io.go:335
# 0x90f195 io.ReadFull+0x95 GOROOT/src/io/io.go:354
# 0x90f16c google.golang.org/grpc/internal/transport.(*Stream).Read+0x6c external/org_golang_google_grpc/internal/transport/transport.go:509
# 0x9722c5 google.golang.org/grpc.(*parser).recvMsg+0x45 external/org_golang_google_grpc/rpc_util.go:614
# 0x972b24 google.golang.org/grpc.recvAndDecompress+0x84 external/org_golang_google_grpc/rpc_util.go:753
# 0x97331c google.golang.org/grpc.recv+0x7c external/org_golang_google_grpc/rpc_util.go:833
# 0x98a8e9 google.golang.org/grpc.(*serverStream).RecvMsg+0x169 external/org_golang_google_grpc/stream.go:1717
# 0xd8b6ac github.com/grpc-ecosystem/go-grpc-prometheus.(*monitoredServerStream).RecvMsg+0x2c external/com_github_grpc_ecosystem_go_grpc_prometheus/server_metrics.go:164
# 0xa5d7a5 go.etcd.io/etcd/api/v3/etcdserverpb.(*watchWatchServer).Recv+0x45 external/io_etcd_go_etcd_api_v3/etcdserverpb/rpc.pb.go:6762
# 0xd9b4c1 go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*serverWatchStream).recvLoop+0x41 external/io_etcd_go_etcd_server_v3/etcdserver/api/v3rpc/watch.go:242
# 0xd9af65 go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*watchServer).Watch.func2+0x45 external/io_etcd_go_etcd_server_v3/etcdserver/api/v3rpc/watch.go:191
and no goroutines stuck sending data to ctrlStream in case *pb.WatchRequest_CancelRequest:
codepath as described in the issue.
Looks like in other places code does not block on sending data to sws.ctrlStream if sws.closec is closed: https://github.com/etcd-io/etcd/blob/40b4715ca371c950c1965cd79519f5f0f9a57921/server/etcdserver/api/v3rpc/watch.go#L298-L303 https://github.com/etcd-io/etcd/blob/40b4715ca371c950c1965cd79519f5f0f9a57921/server/etcdserver/api/v3rpc/watch.go#L339-L343
but the codepath I'm seeing in traces does block: https://github.com/etcd-io/etcd/blob/40b4715ca371c950c1965cd79519f5f0f9a57921/server/etcdserver/api/v3rpc/watch.go#L349-L360
Wonder if it could be an issue.
Maybe this codepath should look like:
diff --git a/server/etcdserver/api/v3rpc/watch.go b/server/etcdserver/api/v3rpc/watch.go
index abb465e21..6781c468a 100644
--- a/server/etcdserver/api/v3rpc/watch.go
+++ b/server/etcdserver/api/v3rpc/watch.go
@@ -347,11 +347,17 @@ func (sws *serverWatchStream) recvLoop() error {
id := uv.CancelRequest.WatchId
err := sws.watchStream.Cancel(mvcc.WatchID(id))
if err == nil {
- sws.ctrlStream <- &pb.WatchResponse{
+ wr := &pb.WatchResponse{
Header: sws.newResponseHeader(sws.watchStream.Rev()),
WatchId: id,
Canceled: true,
}
+ select {
+ case sws.ctrlStream <- wr:
+ case <-sws.closec:
+ return nil
+ }
+
sws.mu.Lock()
delete(sws.progress, mvcc.WatchID(id))
delete(sws.prevKV, mvcc.WatchID(id))
Maybe this codepath should look like:
diff --git a/server/etcdserver/api/v3rpc/watch.go b/server/etcdserver/api/v3rpc/watch.go index abb465e21..6781c468a 100644 --- a/server/etcdserver/api/v3rpc/watch.go +++ b/server/etcdserver/api/v3rpc/watch.go @@ -347,11 +347,17 @@ func (sws *serverWatchStream) recvLoop() error { id := uv.CancelRequest.WatchId err := sws.watchStream.Cancel(mvcc.WatchID(id)) if err == nil { - sws.ctrlStream <- &pb.WatchResponse{ + wr := &pb.WatchResponse{ Header: sws.newResponseHeader(sws.watchStream.Rev()), WatchId: id, Canceled: true, } + select { + case sws.ctrlStream <- wr: + case <-sws.closec: + return nil + } + sws.mu.Lock() delete(sws.progress, mvcc.WatchID(id)) delete(sws.prevKV, mvcc.WatchID(id))
Have you verified it can resolve your performance "issue"? I do not see any real case which may lead to etcdserver being blocked here.
But It won't do any harm to add a protection as you pointed out.
I already deployed the change to staging env, today/tmrw I’ll be able to validate in prod. At a first glance it seems to have fixed the issue, there are no goroutines with following traces:
# 0xd9bb8c go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*serverWatchStream).recvLoop+0x70c external/io_etcd_go_etcd_server_v3/etcdserver/api/v3rpc/watch.go:348
# 0xd9af65 go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*watchServer).Watch.func2+0x45
0xd9bb8c go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*serverWatchStream).recvLoop+0x70c external/io_etcd_go_etcd_server_v3/etcdserver/api/v3rpc/watch.go:348
Did you see lots of the recvLoop
goroutines being blocked exactly on the line sws.ctrlStream <- &pb.WatchResponse
?
https://github.com/etcd-io/etcd/blob/54db7f053059c7406ca7d2f70339d2c9e5e20c54/server/etcdserver/api/v3rpc/watch.go#L350
Were the count of sendLoop
goroutines much much less than the recvLoop
when you saw goroutine leak?
https://github.com/etcd-io/etcd/blob/54db7f053059c7406ca7d2f70339d2c9e5e20c54/server/etcdserver/api/v3rpc/watch.go#L378
In theory, it seems that there is a race condition between sendLoop
and recvLoop
. When the stream is being closed, the recvLoop
might receive a WatchRequest_CancelRequest
right after sendLoop
exits, then it will be blocked on sending message to sws.ctrlStream
.
In the trace I posted initially there are a 40k+ such goroutines and the number is growing over time:
root@pdx3a-rp11-24a:~# curl --cert /srv/grpc-certs/etcd/latest/cert.pem --key /srv/grpc-certs/etcd/latest/key.pem --http1.1 -k -s https://localhost:5000/debug/pprof/goroutine?debug=1 | head -n 10
goroutine profile: total 177241
43832 @ 0x43fe6e 0x40a6a5 0x40a2f7 0xd9bb8d 0xd9af66 0x473181
# 0xd9bb8c go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*serverWatchStream).recvLoop+0x70c external/io_etcd_go_etcd_server_v3/etcdserver/api/v3rpc/watch.go:348
# 0xd9af65 go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*watchServer).Watch.func2+0x45 external/io_etcd_go_etcd_server_v3/etcdserver/api/v3rpc/watch.go:191
----------------------
# after a few seconds:
----------------------
43836 @ 0x43fe6e 0x40a6a5 0x40a2f7 0xd9bb8d 0xd9af66 0x473181
# 0xd9bb8c go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*serverWatchStream).recvLoop+0x70c external/io_etcd_go_etcd_server_v3/etcdserver/api/v3rpc/watch.go:348
# 0xd9af65 go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*watchServer).Watch.func2+0x45
which points to https://github.com/etcd-io/etcd/blob/v3.5.14/server/etcdserver/api/v3rpc/watch.go#L348
And number of such goroutines does not match number of sendLoops running:
49411 @ 0x43fe6e 0x40a6a5 0x40a2f7 0xd9bb8d 0xd9af66 0x473181
# 0xd9bb8c go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*serverWatchStream).recvLoop+0x70c external/io_etcd_go_etcd_server_v3/etcdserver/api/v3rpc/watch.go:348
# 0xd9af65 go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*watchServer).Watch.func2+0x45 external/io_etcd_go_etcd_server_v3/etcdserver/api/v3rpc/watch.go:191
10387 @ 0x43fe6e 0x450345 0x90dd6a 0x90dc5b 0x90f22c 0x4a29f0 0x90f196 0x90f16d 0x9722c6 0x972b25 0x97331d 0x98a8ea 0xd8b6ad 0xa5d7a6 0xd9b4c2 0xd9af66 0x473181
# 0x90dd69 google.golang.org/grpc/internal/transport.(*recvBufferReader).read+0x89 external/org_golang_google_grpc/internal/transport/transport.go:181
# 0x90dc5a google.golang.org/grpc/internal/transport.(*recvBufferReader).Read+0x15a external/org_golang_google_grpc/internal/transport/transport.go:175
# 0x90f22b google.golang.org/grpc/internal/transport.(*transportReader).Read+0x2b external/org_golang_google_grpc/internal/transport/transport.go:525
# 0x4a29ef io.ReadAtLeast+0x8f GOROOT/src/io/io.go:335
# 0x90f195 io.ReadFull+0x95 GOROOT/src/io/io.go:354
# 0x90f16c google.golang.org/grpc/internal/transport.(*Stream).Read+0x6c external/org_golang_google_grpc/internal/transport/transport.go:509
# 0x9722c5 google.golang.org/grpc.(*parser).recvMsg+0x45 external/org_golang_google_grpc/rpc_util.go:614
# 0x972b24 google.golang.org/grpc.recvAndDecompress+0x84 external/org_golang_google_grpc/rpc_util.go:753
# 0x97331c google.golang.org/grpc.recv+0x7c external/org_golang_google_grpc/rpc_util.go:833
# 0x98a8e9 google.golang.org/grpc.(*serverStream).RecvMsg+0x169 external/org_golang_google_grpc/stream.go:1717
# 0xd8b6ac github.com/grpc-ecosystem/go-grpc-prometheus.(*monitoredServerStream).RecvMsg+0x2c external/com_github_grpc_ecosystem_go_grpc_prometheus/server_metrics.go:164
# 0xa5d7a5 go.etcd.io/etcd/api/v3/etcdserverpb.(*watchWatchServer).Recv+0x45 external/io_etcd_go_etcd_api_v3/etcdserverpb/rpc.pb.go:6762
# 0xd9b4c1 go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*serverWatchStream).recvLoop+0x41 external/io_etcd_go_etcd_server_v3/etcdserver/api/v3rpc/watch.go:242
# 0xd9af65 go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*watchServer).Watch.func2+0x45 external/io_etcd_go_etcd_server_v3/etcdserver/api/v3rpc/watch.go:191
10387 @ 0x43fe6e 0x450345 0xd9add1 0xa5d6b1 0xd8c652 0xd909d7 0xd93b82 0xd909d7 0xd90875 0x97cae7 0x97e8e5 0x97772b 0x473181
# 0xd9add0 go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*watchServer).Watch+0x390 external/io_etcd_go_etcd_server_v3/etcdserver/api/v3rpc/watch.go:211
# 0xa5d6b0 go.etcd.io/etcd/api/v3/etcdserverpb._Watch_Watch_Handler+0x90 external/io_etcd_go_etcd_api_v3/etcdserverpb/rpc.pb.go:6743
# 0xd8c651 github.com/grpc-ecosystem/go-grpc-prometheus.init.(*ServerMetrics).StreamServerInterceptor.func4+0xd1 external/com_github_grpc_ecosystem_go_grpc_prometheus/server_metrics.go:121
# 0xd909d6 go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.Server.ChainStreamServer.func7.1.1+0x36 external/com_github_grpc_ecosystem_go_grpc_middleware/chain.go:49
# 0xd93b81 go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.newStreamInterceptor.func1+0x4a1 external/io_etcd_go_etcd_server_v3/etcdserver/api/v3rpc/interceptor.go:252
# 0xd909d6 go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.Server.ChainStreamServer.func7.1.1+0x36 external/com_github_grpc_ecosystem_go_grpc_middleware/chain.go:49
# 0xd90874 go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.Server.ChainStreamServer.func7+0xb4 external/com_github_grpc_ecosystem_go_grpc_middleware/chain.go:58
# 0x97cae6 google.golang.org/grpc.(*Server).processStreamingRPC+0x1266 external/org_golang_google_grpc/server.go:1670
# 0x97e8e4 google.golang.org/grpc.(*Server).handleStream+0xfc4 external/org_golang_google_grpc/server.go:1784
# 0x97772a google.golang.org/grpc.(*Server).serveStreams.func2.1+0x8a external/org_golang_google_grpc/server.go:1019
10385 @ 0x43fe6e 0x450345 0xd9c3c5 0xd9b33c 0x473181
# 0xd9c3c4 go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*serverWatchStream).sendLoop+0x244 external/io_etcd_go_etcd_server_v3/etcdserver/api/v3rpc/watch.go:398
# 0xd9b33b go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*watchServer).Watch.func1+0x1b external/io_etcd_go_etcd_server_v3/etcdserver/api/v3rpc/watch.go:181
And number of such goroutines does not match number of sendLoops running:
Do you have rough numbers (count of recvLoop: count of sendLoop)?
The first column in goroutine profiles depicts the number of goroutines, in this case 49411:
49411 @ 0x43fe6e 0x40a6a5 0x40a2f7 0xd9bb8d 0xd9af66 0x473181
# 0xd9bb8c go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*serverWatchStream).recvLoop+0x70c external/io_etcd_go_etcd_server_v3/etcdserver/api/v3rpc/watch.go:348
# 0xd9af65 go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*watchServer).Watch.func2+0x45
So 10385 sendLoop ones, 10387 healthy recvLoop ones and 49411 recvLoop stuck on sending data to channel.
The first column in goroutine profiles depicts the number of goroutines, in this case 49411:
I missed it. thx
To double confirm, you do not see this difference (counts of recvLoop and sendLoop) after patching the change in https://github.com/etcd-io/etcd/issues/18704#issuecomment-2414403747?
I can confirm no leak after running it for a day:
High level the change makes sense, all other usages of ctrlStream
select on also on closec
. Looks like watch server stream tests seem insufficient. Will look if we can improve that.
I can confirm no leak after running it for a day:
Thanks for the feedback. The reason should be the race condition pointed out in https://github.com/etcd-io/etcd/issues/18704#issuecomment-2414776234.
The problem is that it's a little hard to reproduce and verify in e2e test. One solution that I can think of is to add two metrics (recvLoopCount
and sendLoopCount
), and
The workaround for this performance issue is to restart the etcd instance.
Reopen to track the backporting effort
Bug report criteria
What happened?
We run several moderately loaded clusters with the following characteristics:
We had a few performance issues at this scale, thus client has a logic to:
We observe a slow what seems to be a slow goroutine leak on our clusters.
so this one in particular seems to be a culprit:
I suspect that it's due to stream not properly closed after all watchers have migrated to a new h/2 stream.
What did you expect to happen?
Steady number of goroutines over time.
How can we reproduce it (as minimally and precisely as possible)?
I'd assume creating and stopping a set of watchers each one with own context metadata would trigger the issue.
Anything else we need to know?
No response
Etcd version (please run commands below)
Etcd configuration (command line flags or environment variables)
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
Relevant log output
No response