LiskArchive / lisk-sdk

🔩 Lisk software development kit
https://lisk.com
Apache License 2.0
2.72k stars 455 forks source link

Failing node synchronization #8432

Closed has5aan closed 1 year ago

has5aan commented 1 year ago

Expected behavior

Nodes should synchronize.

Actual behavior

Nodes are unable to synchronize caused by;

  1. ~~Another issue observed was one of the nodes throwing; [err=Cannot read properties of undefined (reading 'moduleStore')] Failed to generate a block. Perhaps moduleStore is not initialized properly within ABIHandler.initStateMachine.~~ This is addressed under #8460

  2. ~~One of the nodes throws New tip of the chain has no preference over the previous tip before synchronizing - This happens because the node when synchronizing doesn't receive peerInfo from P2P library. In a scenario, where three nodes are running and generating blocks, all unable to discover each other. Let them generate a few rounds, eventually relaunch them with fixed-peers configured, relaunching them theoretically at the same time will increase the chances of reproducing the issue. It is possible to land on this scenario where supposedly correct tip received from a peer is rejected, this is happening because; PeerInfo.options for the configured fixed peers received from Network.getConnectedPeers is incorrect, the concerned node at this stage is perhaps not initialized and a request over the WS is failing at this stage, however, delaying invocation of nodes allowing them to initialize all components resolves this.~~ Peer tip does not have preference over current tip. This is expected behavior, as nodeInfo is received from the peers and not polled.

Steps to reproduce

Ran three nodes, with 34, 34 and 33 validators split from dev-validators.json, lets call them pos-mainchain, sync and sync2. All configured in a way so they are unable to discover each other as peer nodes. Let these three nodes ran independently for a few rounds (it was possible to reproduce the issue twice with each node having generated 40 or 70 rounds) Eventually relaunching the three nodes, with seed peers configured enabling them to sync and land on a single chain.

Which version(s) does this affect? (Environment, OS, etc...)

Lisk SDK development branch.

has5aan commented 1 year ago

Peer tip does not have preference over current tip. issue.

This is expected as NodeInfo is received from the connecting node and not polled, and if it is not received, its options field is set to default value. This result in the Peer tip does not have preference over current tip. error, as the selected peer's properties height and maxHeightPrevoted are set to 0, which results in the call to BlockSynchronizationMechanism.isDifferentChain causing the error.

This is also possible if NodeInfo is already received for each connected peer, and as a peer is randomly selected an already connected and has height and maxHeightPrevoted are lower than the node syncing blocks, resulting in the the same error.