ethereum / go-ethereum

Go implementation of the Ethereum protocol
https://geth.ethereum.org
GNU Lesser General Public License v3.0
47.54k stars 20.13k forks source link

potential go routine leak in rpc package #26040

Open 9iang22 opened 2 years ago

9iang22 commented 2 years ago

System information

OS & Version: Windows 11 & Ubuntu 18.04 Commit hash : develop

When I run the TestClientCancelWebsocket with goleak, there are some go routine leaks, as mentioned below. Actually I think it blocks at this function.

func (h *handler) startCallProc(fn func(*callProc)) {
    h.callWG.Add(1)
    var wg sync.WaitGroup
    go func() {
        ctx, cancel := context.WithCancel(h.rootCtx)
        defer h.callWG.Done()
        defer cancel()
        fn(&callProc{ctx: ctx})
    }()
}

In TestClientCancelWebsocket , the fn is to test block (test_block) so I think it may block forever and the h.callWG.Done will never be called. Although it is caused by a client test, it causes leaks in the server part. I wonder if it is normal for this fn call without a timeout.

Expected behaviour

Pass

Actual behaviour

         Goroutine 35 in state select, with github.com/ethereum/go-ethereum/rpc.(*Client).dispatch on top of the stack:
        goroutine 35 [select]:
        github.com/ethereum/go-ethereum/rpc.(*Client).dispatch(0xc0001b2000, {0x95b808?, 0xc0001a0000})
                /data/rpc/client.go:587 +0x2cc
        created by github.com/ethereum/go-ethereum/rpc.initClient
                /data/rpc/client.go:251 +0x311

         Goroutine 141 in state select, with github.com/ethereum/go-ethereum/rpc.(*Client).Close on top of the stack:
        goroutine 141 [select]:
        github.com/ethereum/go-ethereum/rpc.(*Client).Close(...)
                /data/rpc/client.go:284
        github.com/ethereum/go-ethereum/rpc.(*Server).ServeCodec(0xc000159a10, {0x95b808?, 0xc000270bd0}, 0xc000232700?)
                /data/rpc/server.go:89 +0x1b8
        github.com/ethereum/go-ethereum/rpc.(*Server).WebsocketHandler.func1({0x95a170?, 0xc00026e620?}, 0xc000232700)
                /data/rpc/websocket.go:64 +0xef
        net/http.HandlerFunc.ServeHTTP(0x0?, {0x95a170?, 0xc00026e620?}, 0x46c6ee?)
                /usr/local/go/src/net/http/server.go:2109 +0x2f
        net/http.serverHandler.ServeHTTP({0xc000272fc0?}, {0x95a170, 0xc00026e620}, 0xc000232700)
                /usr/local/go/src/net/http/server.go:2947 +0x30c
        net/http.(*conn).serve(0xc0000015e0, {0x95a6c0, 0xc000272240})
                /usr/local/go/src/net/http/server.go:1991 +0x607
        created by net/http.(*Server).Serve
                /usr/local/go/src/net/http/server.go:3102 +0x4db

         Goroutine 13 in state chan receive, with github.com/ethereum/go-ethereum/rpc.(*Server).ServeCodec on top of the stack:
        goroutine 13 [chan receive]:
        github.com/ethereum/go-ethereum/rpc.(*Server).ServeCodec(0xc000159a10, {0x95b808?, 0xc0002706c0}, 0xc000232400?)
                /data/rpc/server.go:88 +0x165
        github.com/ethereum/go-ethereum/rpc.(*Server).WebsocketHandler.func1({0x95a170?, 0xc00026e2a0?}, 0xc000232400)
                /data/rpc/websocket.go:64 +0xef
        net/http.HandlerFunc.ServeHTTP(0x0?, {0x95a170?, 0xc00026e2a0?}, 0x46c6ee?)
                /usr/local/go/src/net/http/server.go:2109 +0x2f
        net/http.serverHandler.ServeHTTP({0xc000272840?}, {0x95a170, 0xc00026e2a0}, 0xc000232400)
                /usr/local/go/src/net/http/server.go:2947 +0x30c
        net/http.(*conn).serve(0xc000001400, {0x95a6c0, 0xc000272240})
                /usr/local/go/src/net/http/server.go:1991 +0x607
        created by net/http.(*Server).Serve
                /usr/local/go/src/net/http/server.go:3102 +0x4db

         Goroutine 144 in state semacquire, with sync.runtime_Semacquire on top of the stack:
        goroutine 144 [semacquire]:
        sync.runtime_Semacquire(0xc000138ac0?)
                /usr/local/go/src/runtime/sema.go:62 +0x25
        sync.(*WaitGroup).Wait(0x18?)
                /usr/local/go/src/sync/waitgroup.go:139 +0x52
        github.com/ethereum/go-ethereum/rpc.(*handler).startCallProc(0xc000270c60, 0xc000012e58)
                /data/rpc/handler.go:232 +0xd4
        github.com/ethereum/go-ethereum/rpc.(*handler).handleMsg(0xc000270c60, 0xc00025bb20)
                /data/rpc/handler.go:138 +0x85
        github.com/ethereum/go-ethereum/rpc.(*Client).dispatch(0xc000230900, {0x95b808?, 0xc000270bd0})
                /data/rpc/client.go:596 +0x411
        created by github.com/ethereum/go-ethereum/rpc.initClient
                /data/rpc/client.go:251 +0x311

Steps to reproduce the behaviour

run TestClientCancelWebsocket with goleak

func TestClientCancelWebsocket(t *testing.T) { 
    opts := []goleak.Option{
        // TODO: figure the reason and shorten this list
        goleak.IgnoreTopFunction("github.com/ethereum/go-ethereum/metrics.(*meterArbiter).tick"),
        goleak.IgnoreTopFunction("github.com/ethereum/go-ethereum/core.(*txSenderCacher).cache"),
        goleak.IgnoreTopFunction("github.com/ethereum/go-ethereum/consensus/ethash.(*remoteSealer).loop"),
        goleak.IgnoreTopFunction("github.com/ethereum/go-ethereum/core/txpool.(*TxPool).loop"),
        goleak.IgnoreTopFunction("github.com/ethereum/go-ethereum/core.(*BlockChain).updateFutureBlocks"),
        goleak.IgnoreTopFunction("github.com/ethereum/go-ethereum/core/txpool.(*TxPool).scheduleReorgLoop"),
        goleak.IgnoreTopFunction("github.com/ethereum/go-ethereum/rpc.(*testService).Block"),
    }
    defer goleak.VerifyNone(t, opts...)
    defer time.Sleep(3*time.Second)
    testClientCancel("ws", t) 
}

Backtrace

see above
MariusVanDerWijden commented 2 years ago

Could repro, looks like the following diff kinda fixes it. Maybe we should add a timeout to the rpc server/rpc client context.

diff --git a/rpc/handler.go b/rpc/handler.go
index f3052e7eb..f921e30ac 100644
--- a/rpc/handler.go
+++ b/rpc/handler.go
@@ -219,7 +219,7 @@ func (h *handler) cancelServerSubscriptions(err error) {
 func (h *handler) startCallProc(fn func(*callProc)) {
        h.callWG.Add(1)
        go func() {
-               ctx, cancel := context.WithCancel(h.rootCtx)
+               ctx, cancel := context.WithTimeout(h.rootCtx, time.Second)
                defer h.callWG.Done()
                defer cancel()
                fn(&callProc{ctx: ctx})

*Edit To be clear I'm not proposing this as a fix, it should only show where the issue is

ziadghali2 commented 1 year ago

i try to test it but i need to know "goleak" in which branch

9iang22 commented 1 year ago

i try to test it but i need to know "goleak" in which branch

It's a third party leak checker, include by go.uber.org/goleak

tak1827 commented 1 year ago

http is also leaking.

func TestHTTPClientGoroutineLeak(t *testing.T) {
    defer goleak.VerifyNone(t)

    ctx := context.Background()
    // connect to local node
    client, err := DialContext(ctx, "http://localhost:8545")
    if err != nil {
        t.Fatal("can't dial", err)
    }
    defer client.Close()

    if err := client.CallContext(ctx, nil, "eth_chainId"); err != nil {
        t.Fatal(err)
    }
}

// Once run above test, go routine leak detected.
--- FAIL: TestHTTPClientGoroutineLeak (0.45s)
    /Users/tak/Documents/angoya/_sandbox/go-ethereum/rpc/client_test.go:70: found unexpected goroutines:
        [Goroutine 25 in state IO wait, with internal/poll.runtime_pollWait on top of the stack:
        ...

Then I noticed that missing to close idling connections

diff --git a/rpc/http.go b/rpc/http.go
index bbabe15ba..4f45d8d46 100644
--- a/rpc/http.go
+++ b/rpc/http.go
@@ -73,6 +73,7 @@ func (hc *httpConn) readBatch() ([]*jsonrpcMessage, bool, error) {

 func (hc *httpConn) close() {
        hc.closeOnce.Do(func() { close(hc.closeCh) })
+       hc.client.CloseIdleConnections()
 }

And missing httpConn.close()

 diff --git a/rpc/client.go b/rpc/client.go
index a509cb2e0..e508ed304 100644
--- a/rpc/client.go
+++ b/rpc/client.go
@@ -280,6 +280,7 @@ func (c *Client) SupportedModules() (map[string]string, error) {
 // Close closes the client, aborting any in-flight requests.
 func (c *Client) Close() {
        if c.isHTTP {
+               c.writeConn.(*httpConn).close()
                return
        }

Those two changes solve go routine leak.