erigontech / erigon

Ethereum implementation on the efficiency frontier https://erigon.gitbook.io
GNU Lesser General Public License v3.0
3.1k stars 1.09k forks source link

[e3] Stuck at shutdown for more than 5 min, had to Ctrl+C #11252

Open wmitsuda opened 1 month ago

wmitsuda commented 1 month ago

e3, yesterday's main:

INFO[07-21|03:43:52.845] Exiting...
INFO[07-21|03:43:52.845] Exiting Engine...
INFO[07-21|03:43:52.847] RPC server shutting down
INFO[07-21|03:43:52.862] HTTP endpoint closed                     url=[::]:8745
INFO[07-21|03:43:52.863] RPC server shutting down
INFO[07-21|03:43:52.863] HTTP endpoint closed                     url=[::]:8546
INFO[07-21|03:43:52.863] RPC server shutting down
WARN[07-21|03:43:52.952] [snapshots] retire blocks                err="DumpBlocks: context canceled"
WARN[07-21|03:43:54.361] [engine] CurrentHeader                   err="ethereumExecutionModule.CurrentHeader: could not begin database tx context canceled"
INFO[07-21|03:43:54.431] Engine HTTP endpoint close               url=127.0.0.1:8551
WARN[07-21|03:44:03.048] nodeDB.QuerySeeds failed                 err="context canceled"
WARN[07-21|03:44:04.199] [EngineBlockDownloader] Could not insert headers and bodies err="ethereumExecutionModule.InsertBlocks: could not begin transaction: context canceled"
sync_locked:16893 skip update meta
WARN[07-21|03:44:35.357] nodeDB.QuerySeeds failed                 err="context canceled"
WARN[07-21|03:45:10.377] nodeDB.QuerySeeds failed                 err="context canceled"
WARN[07-21|03:45:46.148] nodeDB.QuerySeeds failed                 err="context canceled"
WARN[07-21|03:46:20.395] nodeDB.QuerySeeds failed                 err="context canceled"
WARN[07-21|03:46:56.165] nodeDB.QuerySeeds failed                 err="context canceled"
WARN[07-21|03:47:31.176] nodeDB.QuerySeeds failed                 err="context canceled"
WARN[07-21|03:48:05.424] nodeDB.QuerySeeds failed                 err="context canceled"
^CWARN[07-21|03:50:55.832] Already shutting down, interrupt more to panic. times=9
^CWARN[07-21|03:50:56.046] Already shutting down, interrupt more to panic. times=8
^CWARN[07-21|03:50:56.600] Already shutting down, interrupt more to panic. times=7
^CWARN[07-21|03:50:56.832] Already shutting down, interrupt more to panic. times=6
^CWARN[07-21|03:50:57.019] Already shutting down, interrupt more to panic. times=5
^CWARN[07-21|03:50:57.318] Already shutting down, interrupt more to panic. times=4
^CWARN[07-21|03:50:57.534] Already shutting down, interrupt more to panic. times=3
^CWARN[07-21|03:50:57.804] Already shutting down, interrupt more to panic. times=2
^CWARN[07-21|03:50:58.018] Already shutting down, interrupt more to panic. times=1
^Cpanic: boom

goroutine 1759 [running]:
github.com/erigontech/erigon/turbo/debug.LoudPanic(...)
        github.com/erigontech/erigon/turbo/debug/loudpanic.go:27
github.com/erigontech/erigon/turbo/debug.ListenSignals({0x1077d27a0, 0xc000bb8000}, {0x107801908, 0xc000d9a9c0})
        github.com/erigontech/erigon/turbo/debug/signal.go:55 +0x358
created by github.com/erigontech/erigon/node.StartNode in goroutine 1
        github.com/erigontech/erigon/node/node.go:426 +0xd2

goroutine 1 [chan receive, 1411 minutes]:
github.com/erigontech/erigon/node.(*Node).Wait(...)
        github.com/erigontech/erigon/node/node.go:273
github.com/erigontech/erigon/turbo/node.(*ErigonNode).Serve(0xdc038702f20)
        github.com/erigontech/erigon/turbo/node/node.go:51 +0x6e
main.runErigon(0xc000adb740)
        github.com/erigontech/erigon/cmd/erigon/main.go:103 +0x425
github.com/erigontech/erigon/turbo/app.MakeApp.func1(0xc000adb740)
        github.com/erigontech/erigon/turbo/app/make_app.go:71 +0x17f
github.com/urfave/cli/v2.(*Command).Run(0xc00146f760, 0xc000adb740, {0xc000052588, 0x28, 0x2b})
        github.com/urfave/cli/v2@v2.27.2/command.go:276 +0x97d
github.com/urfave/cli/v2.(*App).RunContext(0xc000c7cc00, {0x1077ebf00, 0x109a3fc40}, {0xc000052588, 0x28, 0x2b})
        github.com/urfave/cli/v2@v2.27.2/app.go:333 +0x58b
github.com/urfave/cli/v2.(*App).Run(...)
        github.com/urfave/cli/v2@v2.27.2/app.go:307
main.main()
        github.com/erigontech/erigon/cmd/erigon/main.go:51 +0x85

goroutine 39 [chan receive, 1411 minutes]:
gopkg.in/natefinch/lumberjack%2ev2.(*Logger).millRun(...)
        gopkg.in/natefinch/lumberjack.v2@v2.2.1/lumberjack.go:379
created by gopkg.in/natefinch/lumberjack%2ev2.(*Logger).mill.func1 in goroutine 1
        gopkg.in/natefinch/lumberjack.v2@v2.2.1/lumberjack.go:390 +0x96

goroutine 151790839 [IO wait]:
internal/poll.runtime_pollWait(0x30d0f9ccd0, 0x77)
        runtime/netpoll.go:345 +0x85
internal/poll.(*pollDesc).wait(0xfc04190b980?, 0x0?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27
internal/poll.(*pollDesc).waitWrite(...)
        internal/poll/fd_poll_runtime.go:93
internal/poll.(*FD).WaitWrite(...)
        internal/poll/fd_unix.go:683
net.(*netFD).connect(0xfc04190b980, {0x1077ec450, 0xfc0502fde30}, {0xdc038bc36c8?, 0x104af493b?}, {0x1077d0ee0?, 0xfc08c200d80?})
        net/fd_unix.go:141 +0x70b
net.(*netFD).dial(0xfc04190b980, {0x1077ec450, 0xfc0502fde30}, {0x1077f6658?, 0x0?}, {0x1077f6658, 0xfc083474c60}, 0x38bc3798?)
        net/sock_posix.go:124 +0x3bc
net.socket({0x1077ec450, 0xfc0502fde30}, {0x1068cd24f, 0x3}, 0x2, 0x1, 0xfc083474c30?, 0x0, {0x1077f6658, 0x0}, ...)
        net/sock_posix.go:70 +0x29b
net.internetSocket({0x1077ec450, 0xfc0502fde30}, {0x1068cd24f, 0x3}, {0x1077f6658, 0x0}, {0x1077f6658, 0xfc083474c60}, 0x1, 0x0, ...)
        net/ipsock_posix.go:154 +0xf8
net.(*sysDialer).doDialTCPProto(0xfc017b8e1e0, {0x1077ec450, 0xfc0502fde30}, 0x0, 0xfc083474c60, 0x0)
        net/tcpsock_posix.go:85 +0xec
net.(*sysDialer).doDialTCP(...)
        net/tcpsock_posix.go:75
net.(*sysDialer).dialTCP(0xdc038bc3938?, {0x1077ec450?, 0xfc0502fde30?}, 0x104b5c6ba?, 0xdc038bc3960?)
        net/tcpsock_posix.go:71 +0x65
net.(*sysDialer).dialSingle(0xfc017b8e1e0, {0x1077ec450, 0xfc0502fde30}, {0x1077de398, 0xfc083474c60})
        net/dial.go:651 +0x27d
net.(*sysDialer).dialSerial(0xfc017b8e1e0, {0x1077ec450, 0xfc0502fde30}, {0xc000c61e90?, 0x1, 0xc000c61e90?})
        net/dial.go:616 +0x24e
net.(*sysDialer).dialParallel(0xc000c61e80?, {0x1077ec450?, 0xfc0502fde30?}, {0xc000c61e90?, 0xfc0502fde30?, 0x1068cdc63?}, {0x0?, 0x1068cd24f?, 0x1?})
        net/dial.go:517 +0x3b4
net.(*Dialer).DialContext(0xc000cf8620, {0x1077ec3e0, 0xdc0392e3130}, {0x1068cd24f, 0x3}, {0xfc08be8d2f0, 0x13})
        net/dial.go:508 +0x69a
github.com/erigontech/erigon/p2p.tcpDialer.Dial({0x1070e9370?}, {0x1077ec3e0, 0xdc0392e3130}, 0xdc03cce01c0?)
        github.com/erigontech/erigon/p2p/dial.go:71 +0x54
github.com/erigontech/erigon/p2p.(*dialTask).dial(0xfc083474b40, 0xc000175b00, 0xdc02b549d50)
        github.com/erigontech/erigon/p2p/dial.go:561 +0x5a
github.com/erigontech/erigon/p2p.(*dialTask).run(0xfc083474b40, 0xc000175b00)
        github.com/erigontech/erigon/p2p/dial.go:511 +0x4f
github.com/erigontech/erigon/p2p.(*dialScheduler).startDial.func1()
        github.com/erigontech/erigon/p2p/dial.go:476 +0x46
created by github.com/erigontech/erigon/p2p.(*dialScheduler).startDial in goroutine 1706
        github.com/erigontech/erigon/p2p/dial.go:474 +0x396

goroutine 151790760 [select]:
net.(*netFD).connect.func2()
        net/fd_unix.go:118 +0x7a
created by net.(*netFD).connect in goroutine 151790759
        net/fd_unix.go:117 +0x37c

goroutine 1666 [chan receive]:
github.com/erigontech/erigon/consensus/ethash.(*meterArbiter).tick(0x1099cdfe0)
        github.com/erigontech/erigon/consensus/ethash/meter.go:159 +0x7c
created by github.com/erigontech/erigon/consensus/ethash.newHashRateMeter in goroutine 1
        github.com/erigontech/erigon/consensus/ethash/meter.go:35 +0xbf

goroutine 1692 [chan receive, 1411 minutes]:
github.com/erigontech/erigon-lib/direct.(*SubscribeLogsStreamC).Recv(0xc0018aa308?)
        github.com/erigontech/erigon-lib@v0.0.0-00010101000000-000000000000/direct/eth_backend_client.go:185 +0x25
github.com/erigontech/erigon/cmd/rpcdaemon/rpcservices.(*RemoteBackend).SubscribeLogs(0xdc04a49c5f0, {0x1077ec3e0, 0xc00191ee10}, 0xdc03af7dd80, 0xc0018aa308)
        github.com/erigontech/erigon/cmd/rpcdaemon/rpcservices/eth_backend.go:250 +0x1b9
github.com/erigontech/erigon/turbo/rpchelper.New.func2()
        github.com/erigontech/erigon/turbo/rpchelper/filters.go:134 +0x17b
created by github.com/erigontech/erigon/turbo/rpchelper.New in goroutine 1
        github.com/erigontech/erigon/turbo/rpchelper/filters.go:122 +0x633

goroutine 1553 [chan receive, 1411 minutes]:
github.com/erigontech/erigon-lib/direct.(*SubscribeLogsStreamS).Recv(0xdc03871bbd0?)
        github.com/erigontech/erigon-lib@v0.0.0-00010101000000-000000000000/direct/eth_backend_client.go:158 +0x25
github.com/erigontech/erigon/ethdb/privateapi.(*LogsFilterAggregator).subscribeLogs(0xdc03871bbd0, {0x1077fbf78, 0xdc04978d3e0})
        github.com/erigontech/erigon/ethdb/privateapi/logsfilter.go:149 +0xa8
github.com/erigontech/erigon/ethdb/privateapi.(*EthBackendServer).SubscribeLogs(0xdc04a0227d0?, {0x1077fbf78?, 0xdc04978d3e0?})
        github.com/erigontech/erigon/ethdb/privateapi/ethbackend.go:273 +0x2a
github.com/erigontech/erigon-lib/direct.(*EthBackendClientDirect).SubscribeLogs.func1()
        github.com/erigontech/erigon-lib@v0.0.0-00010101000000-000000000000/direct/eth_backend_client.go:125 +0x92
created by github.com/erigontech/erigon-lib/direct.(*EthBackendClientDirect).SubscribeLogs in goroutine 1692
        github.com/erigontech/erigon-lib@v0.0.0-00010101000000-000000000000/direct/eth_backend_client.go:122 +0x11f

goroutine 1629 [syscall]:
os/signal.signal_recv()
        runtime/sigqueue.go:149 +0x25
os/signal.loop()
        os/signal/signal_unix.go:23 +0x13
created by os/signal.Notify.func1.1 in goroutine 1759
        os/signal/signal.go:151 +0x1f

goroutine 1152 [select]:
golang.org/x/time/rate.(*Limiter).wait(0xdc03e936af0, {0x1077ec3e0, 0xdc03e936b40}, 0x1, {0x19f49b55d8576f0?, 0x1077d59e0?, 0x1099cf400?}, 0x1077bdc88)
        golang.org/x/time@v0.5.0/rate/rate.go:285 +0x3fc
golang.org/x/time/rate.(*Limiter).WaitN(0xdc03e936af0, {0x1077ec3e0, 0xdc03e936b40}, 0x1)
        golang.org/x/time@v0.5.0/rate/rate.go:248 +0x50
golang.org/x/time/rate.(*Limiter).Wait(...)
        golang.org/x/time@v0.5.0/rate/rate.go:233
github.com/erigontech/erigon/p2p/dnsdisc.(*Client).resolveEntry(0xdc0386a4b00, {0x1077ec3e0, 0xdc03e936b40}, {0xdc0392e6c40, 0x18}, {0xfc0fa1998cc, 0x1a})
        github.com/erigontech/erigon/p2p/dnsdisc/client.go:179 +0x65
github.com/erigontech/erigon/p2p/dnsdisc.(*subtreeSync).resolveNext(0xfc035bc40a0, {0x1077ec3e0?, 0xdc03e936b40?}, {0xfc0fa1998cc?, 0x10671b9f0?})
        github.com/erigontech/erigon/p2p/dnsdisc/sync.go:264 +0x4f
github.com/erigontech/erigon/p2p/dnsdisc.(*clientTree).syncNextRandomENR(0xdc04a64ad20, {0x1077ec3e0, 0xdc03e936b40})
        github.com/erigontech/erigon/p2p/dnsdisc/sync.go:139 +0x76
github.com/erigontech/erigon/p2p/dnsdisc.(*clientTree).syncRandom(0xdc04a64ad20, {0x1077ec3e0, 0xdc03e936b40})
        github.com/erigontech/erigon/p2p/dnsdisc/sync.go:100 +0x21d
github.com/erigontech/erigon/p2p/dnsdisc.(*randomIterator).nextNode(0xdc0386a4b80)
        github.com/erigontech/erigon/p2p/dnsdisc/client.go:288 +0x45
github.com/erigontech/erigon/p2p/dnsdisc.(*randomIterator).Next(0xdc0386a4b80)
        github.com/erigontech/erigon/p2p/dnsdisc/client.go:267 +0x18
github.com/erigontech/erigon/p2p/enode.(*FairMix).runSource(0xdc04a749440, 0xdc04a7e8ea0, 0xdc04a7aabc0)
        github.com/erigontech/erigon/p2p/enode/iter.go:285 +0xdb
created by github.com/erigontech/erigon/p2p/enode.(*FairMix).AddSource in goroutine 1730
        github.com/erigontech/erigon/p2p/enode/iter.go:185 +0x233

goroutine 1608 [IO wait, 1411 minutes]:
internal/poll.runtime_pollWait(0x10b0d0550, 0x72)
        runtime/netpoll.go:345 +0x85
internal/poll.(*pollDesc).wait(0xdc03f0c9300?, 0x10?, 0x0)
        internal/poll/fd_poll_runtime.go:84 +0x27
internal/poll.(*pollDesc).waitRead(...)
        internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xdc03f0c9300)
        internal/poll/fd_unix.go:611 +0x2ac
net.(*netFD).accept(0xdc03f0c9300)
        net/fd_unix.go:172 +0x29
net.(*TCPListener).accept(0xdc047090960)
        net/tcpsock_posix.go:159 +0x1e
net.(*TCPListener).Accept(0xdc047090960)
        net/tcpsock.go:327 +0x30
github.com/erigontech/erigon/p2p.(*Server).listenLoop(0xdc04a392008, {0x1077ec3e0, 0xdc0470c8b40})
        github.com/erigontech/erigon/p2p/server.go:939 +0x70a
github.com/erigontech/erigon/p2p.(*Server).setupListening.func2()
        github.com/erigontech/erigon/p2p/server.go:758 +0x6e
created by github.com/erigontech/erigon/p2p.(*Server).setupListening in goroutine 1735
        github.com/erigontech/erigon/p2p/server.go:755 +0x265

goroutine 1701 [select]:
github.com/erigontech/erigon/p2p/discover.(*Table).loop(0xdc04a67a160)
        github.com/erigontech/erigon/p2p/discover/table.go:275 +0x365
created by github.com/erigontech/erigon/p2p/discover.ListenV4 in goroutine 1730
        github.com/erigontech/erigon/p2p/discover/v4_udp.go:190 +0x566

goroutine 1703 [chan receive, 8 minutes]:
github.com/erigontech/erigon/p2p/discover.(*UDPv4).handleReply(0xdc03557a540, {0x47, 0x59, 0x21, 0xbe, 0x4, 0xfc, 0x9e, 0x6c, 0x39, ...}, ...)
        github.com/erigontech/erigon/p2p/discover/v4_udp.go:491 +0x1b9
github.com/erigontech/erigon/p2p/discover.(*UDPv4).verifyPong(0xdc03557a540, 0x10b16fd18?, 0xfc0833b9650, {0x47, 0x59, 0x21, 0xbe, 0x4, 0xfc, 0x9e, ...}, ...)
        github.com/erigontech/erigon/p2p/discover/v4_udp.go:916 +0x15e
github.com/erigontech/erigon/p2p/discover.(*UDPv4).handlePacket(0xdc03557a540, 0xfc0833b9650, {0xdc04a1d0000?, 0x500?, 0x500?})
        github.com/erigontech/erigon/p2p/discover/v4_udp.go:765 +0x339
github.com/erigontech/erigon/p2p/discover.(*UDPv4).readLoop(0xdc03557a540, 0x0)
        github.com/erigontech/erigon/p2p/discover/v4_udp.go:722 +0x1af
created by github.com/erigontech/erigon/p2p/discover.ListenV4 in goroutine 1730
        github.com/erigontech/erigon/p2p/discover/v4_udp.go:194 +0x633

goroutine 1705 [select]:
github.com/erigontech/erigon/p2p/enode.(*FairMix).Next(0xdc04a749440)
        github.com/erigontech/erigon/p2p/enode/iter.go:222 +0x14d
github.com/erigontech/erigon/p2p.(*dialScheduler).readNodes(0xc000175b00, {0x1077e6390, 0xdc04a749440})
        github.com/erigontech/erigon/p2p/dial.go:338 +0x133
created by github.com/erigontech/erigon/p2p.newDialScheduler in goroutine 1730
        github.com/erigontech/erigon/p2p/dial.go:188 +0x44b

goroutine 1706 [select]:
github.com/erigontech/erigon/p2p.(*dialScheduler).loop(0xc000175b00, {0x1077e6390, 0xdc04a749440})
        github.com/erigontech/erigon/p2p/dial.go:254 +0x30f
created by github.com/erigontech/erigon/p2p.newDialScheduler in goroutine 1730
        github.com/erigontech/erigon/p2p/dial.go:189 +0x4b5

goroutine 1716 [semacquire, 8 minutes]:
sync.runtime_Semacquire(0xdc04a74f850?)
        runtime/sema.go:62 +0x25
sync.(*WaitGroup).Wait(0x1089494f8?)
        sync/waitgroup.go:116 +0x48
github.com/erigontech/erigon/p2p/discover.(*UDPv4).Close.func1()
        github.com/erigontech/erigon/p2p/discover/v4_udp.go:230 +0x4a
sync.(*Once).doSlow(0x5?, 0x0?)
        sync/once.go:74 +0xc2
sync.(*Once).Do(...)
        sync/once.go:65
github.com/erigontech/erigon/p2p/discover.(*UDPv4).Close(0xdc04a74fdd0?)
        github.com/erigontech/erigon/p2p/discover/v4_udp.go:227 +0x3c
github.com/erigontech/erigon/p(0xdc0386dae08)
        github.com/erigontech/erigon/p2p/server.go:872 +0x13fc
created by github.com/erigontech/erigon/p2p.(*Server).Start in goroutine 1730
        github.com/erigontech/erigon/p2p/server.go:538 +0x527
^C
AskAlexSharov commented 1 month ago

at

github.com/erigontech/erigon-lib/direct.(*SubscribeLogsStreamS).Recv(0xdc03871bbd0?)
        github.com/erigontech/erigon-lib@v0.0.0-00010101000000-000000000000/direct/eth_backend_client.go:158 +0x25

there is:

    m, ok := <-s.chRecv
    if !ok || m == nil {
        return nil, io.EOF
    }

which doesn't select on ctx.Done()

I see that we already closing channels when subscription breaks:

subscribeLogsRequestChan := make(chan *subscribeLogsRequest, 16384)
    subscribeLogsReplyChan := make(chan *subscribeLogsReply, 16384)
    srv := &SubscribeLogsStreamS{
        chSend: subscribeLogsReplyChan,
        chRecv: subscribeLogsRequestChan,
        ctx:    ctx,
    }
    go func() {
        defer close(subscribeLogsRequestChan)
        defer close(subscribeLogsReplyChan)
        srv.Err(s.server.SubscribeLogs(srv))
    }()

Means:

But problem is here: s.server.SubscribeLogs(srv) -> LogsFilterAggregator) subscribeLogs(server remote.ETHBACKEND_SubscribeLogsServer) -> Recv() -> <-s.chRecv means - server can't finish server.SubscribeLogs because stuck at <-s.chRecv - so we must add "select on ctx.Done()" in Recv()? seems yes. maybe in all erigon-lib/direct classes.

FYI: direct - it's grpc call without serialization (within 1 process).