ethereum-optimism / optimism

Optimism is Ethereum, scaled.
https://optimism.io
MIT License
5.54k stars 3.19k forks source link

Nodes can't recover from large L2 unsafe head gaps using p2p req resp sync #11779

Open anacrolix opened 1 week ago

anacrolix commented 1 week ago

op-node will request gaps between the current head and the L2 unsafe head using the req resp (request-response) "alt sync" protocol if the blocks don't arrive via gossip. When there are network or service issues that cause stalls in gossip for more than roughly a minute, gossip will be rejected or not contain blocks needed to catch up, and nodes will enter a pathological cycle of being unable to obtain the blocks they need if most of the peers they are connected to also don't have the blocks. This is particularly bad when the sequencer becomes unavailable, because it will continue to produce blocks despite no other nodes being connected. When connectivity is resumed, all other nodes will be behind.

In the req resp arrangement, "client" is the requester, and "server" is the one receiving the message. The current req resp algorithm randomly requests blocks from peers, and has several undesirable properties:

zhiqiangxu commented 6 days ago

This issue can be fixed by using the p2p.sync.onlyreqtostatic flag introduced here.

anacrolix commented 6 days ago

I wondered where that static code came from. You'll be pleased to learn the PR mitigates the need for the flag.

zhiqiangxu commented 6 days ago

Yeah the ultimate goal is the same: to find trusted nodes to sync, either manually by p2p.sync.onlyreqtostatic, or automatically with your change :)