TLDR: We need to re-request missed proposals after startup (I believe)
The issue arises when a node goes down before receiving a proposal, whilst the other nodes do receive it. We then end up in a situation where the node that went down is 1 view behind the other nodes, yet all have the same exponential backoff period since their high_qc values are different. This results in syncing never happening.
Example logs:
Node that went down with high_qc view 2 has a exp. backoff timeout of 40_000 for view 6.
2024-11-12T12:11:31.636766Z TRACE zilliqa::consensus: 507: Not proceeding with view change. Current view: 6 - time since last: 4575, timeout requires: 40000
Other nodes with high_qc view 3 have an exp. backoff timeout of 40_000 for view 7
2024-11-12T12:11:32.122232Z TRACE zilliqa::consensus: 507: Not proceeding with view change. Current view: 7 - time since last: 21155, timeout requires: 40000
Other nodes will always move on to the next view before the node that went down can catch up.
What should happen is that once our restarted node becomes leader it receives the NewView messages for a view higher than its own and updates its view at that point. However, it fails to propcess the NewView messages at this point because of:
parent not found while determining leader for view
TLDR: We need to re-request missed proposals after startup (I believe)
The issue arises when a node goes down before receiving a proposal, whilst the other nodes do receive it. We then end up in a situation where the node that went down is 1 view behind the other nodes, yet all have the same exponential backoff period since their
high_qc
values are different. This results in syncing never happening.Example logs:
Node that went down with
high_qc
view 2 has a exp. backoff timeout of 40_000 for view 6.Other nodes with
high_qc
view 3 have an exp. backoff timeout of 40_000 for view 7Other nodes will always move on to the next view before the node that went down can catch up.
What should happen is that once our restarted node becomes leader it receives the
NewView
messages for a view higher than its own and updates its view at that point. However, it fails to propcess theNewView
messages at this point because of: