Open nflaig opened 1 year ago
Reopening until we can confirm this is fixed by upgrading libp2p, see https://github.com/ChainSafe/lodestar/issues/5642#issuecomment-1761433473.
This is still an issue in the latest release as confirmed by seamonkey
Might be resolved in the next node release, see fix https://github.com/nodejs/node/pull/51526 and another related issue https://github.com/tinylibs/tinypool/issues/54
@nflaig I haven't heard of this happening lately, stale?
This is kinda related to https://github.com/ChainSafe/lodestar/issues/6053 as well, I haven't seen this myself in a while which is good because hanging work was a pain to deal with.
Should be observed more closely and can probably close it
we also wanna consider removing the timeout if we don't see it anymore https://github.com/ChainSafe/lodestar/blob/cf722195b5ebabeb703eb973b2a9b6c70bc32d71/packages/beacon-node/src/network/core/networkCore.ts#L269-L272
Describe the bug
The network worker thread is not shutting down in some cases.
Force closing (
kill
) does not help either because worker does receive signal.It is pretty clear based on observed logs that the issue is the network worker.
"terminating network worker" is logged https://github.com/ChainSafe/lodestar/blob/5116493e87a9fe2102ebc7a3ff77677c90da3ddf/packages/beacon-node/src/network/core/networkCoreWorkerHandler.ts#L129
but "terminated network worker" is not https://github.com/ChainSafe/lodestar/blob/5116493e87a9fe2102ebc7a3ff77677c90da3ddf/packages/beacon-node/src/network/core/networkCoreWorkerHandler.ts#L131
I checked the log files, there is nothing that peaks out, likely just the same issue we see on the main thread with libp2p
Can likely be resolved by calling
process.exit
on worker explicitly or force closing it using another approach.Expected behavior
Network worker should shut down cleanly and not hang the main process.
Steps to reproduce
Run Lodestar beacon node with
--network.useWorker true
flag.Additional context
The problem seems to be with libp2p which in the end must take care of closing all connections / removing tcp listeners. There are several closed but also open issues regarding connections not being closed properly.
This comment https://github.com/libp2p/js-libp2p/issues/436#issuecomment-432624680 summarizes open tasks but there was no progress in a while.
Operating system
Linux
Lodestar version or commit hash
ec8153153c83d082cb89fca682d502e709e3f415