ChainSafe / lodestar

🌟 TypeScript Implementation of Ethereum Consensus
https://lodestar.chainsafe.io
Apache License 2.0
1.18k stars 290 forks source link

Huge network thread latency to/from main thread when subscribing to all subnets #7188

Open twoeths opened 2 weeks ago

twoeths commented 2 weeks ago

Describe the bug

When subscribing to all subnets, network thread to/from main thread latency increase significantly

Screenshot 2024-10-22 at 15 49 01

I don't see any latency from libp2p, or at least it's a tiny difference, we still receive gossip block on time at network thread

Screenshot 2024-10-22 at 15 50 01

the above latency leads to significantly increase of below topics:

Screenshot 2024-10-22 at 15 51 52 Screenshot 2024-10-22 at 15 52 24

Expected behavior

for beacon_block topic, we don't want to have this latency, as it causes block to becomes head very late and validator votes for wrong head, see #7186

Steps to reproduce

No response

Additional context

No response

Operating system

Linux

Lodestar version or commit hash

v1.22.0, unstable

twoeths commented 2 weeks ago

I had a hypothesis that queued attestations at the main thread side may cause this, so I queued at network thread instead but it did not work, see #7189

it seems to me the gc caused this (before 23:00 is the version without subscribe_to_subnet flag)

Screenshot 2024-10-22 at 16 10 07
twoeths commented 1 week ago

the validator performance really depends on latency of beacon block, this metric track the time when we first seen beacon block in network thread until the time it's validated in main thread

Screenshot 2024-10-28 at 09 49 35

it really looks like gc caused beacon blocks to be delayed frequently when it runs due to that validator votes for wrong block head/target and it may cause missed attestations for those blocks