Zilliqa / zq2

Zilliqa 2.0 code base
Apache License 2.0
9 stars 0 forks source link

A node restarted multiple times whilst others are timing out can jump to a view ahead of others #1806

Open 86667 opened 5 days ago

86667 commented 5 days ago

When a node restarts it finds the elapsed time since it last wrote a view number update to the table and uses it to find the minimum number of views which may have timed out in that time.

For example suppose a node has been down for 15 seconds, we know that if consensus has not been achieved in this time then 2 views have passed - the first with a timeout of 5 seconds + the second with a timeout of 10s.

In order to speed up resyncing when all other nodes are timing out we set our view number to include this minimum missed views value. This setting of our view number also sets the timestamp for when the last view was written, and so if we were to restart again we would use the elapsed time since the view jump was written rather than the elapsed time since the original shut down.

This results in an incorrect calculation for the minimum number of views passed.

A fix would be to use the timestamp which finalised_view was written, or the timestamp which the head block was written.