Open aidan-kwon opened 3 years ago
I also requested the change resolving the issue in https://github.com/ConsenSys/quorum/pull/1134. Klaytn is considering the same patch also in https://github.com/klaytn/klaytn/pull/888.
Thanks @aidan-kwon we are looking into this
@aidan-kwon Thanks a lot for your message and for raising this.
This is a known issue with the current IBFT implementation and they are actually other similar scenarios not covered by this fix.
As I write we are finalizing the implementation of a fresh new version of BFT consensus that will solve those issues (c.f. specs: https://github.com/ConsenSys/quorum-ibft)
For protocol implementation, you can refer to this branch: https://github.com/ConsenSys/quorum/tree/updated_ibft
About the fix you proposed, we prefer not to merge, so we do not expose to the possible introduction of some new subtle problems that could come along with the fix.
In the future, we will recommend you migrating your current IBFT network to the new version of BFT.
Thanks a lot.
@nmvalera Thank you for the kind comment. I will look at the new version of BFT. Thank you!
System information
Geth version:
geth version
OS & Version: Windows/Linux/OSX
Branch, Commit Hash or Release:
git status
Scenario triggering the issue
There are 4 validators. They failed to achieve a consensus in round 0 because of the harsh network environment. At the end of round 0, two of them locked the proposal receiving 4 PREPARE messages. (They will be called
hash-locked
node in the following description.) Other nodes didn't lock the proposal because they received only 2 PREPARE messages in the round for some reason. (They will be callednon-hash-locked
nodes in the following description.)In round 1, a hash-locked proposer sends PREPREPARE message containing the hash-locked proposal to others. Handling PREPREPARE message, 2 hash-locked nodes set their state to
Prepared
and send COMMIT message sine the proposal was already prepared in round 0. The other nodes set their state toPreprepared
and send PREPARE message. As a result, 2 PREPARE and 2 COMMIT messages are broadcasted on the network. (Consider the network is healthy now)Expected behaviour
They should achieve a consensus because they all agreed on the proposal.
Actual behaviour
They could fail to achieve a consensus depends on the message order.
If non-hash-locked nodes receive 2 PREPARE messages before 2 COMMIT messages, they fail to achieve a consensus. If non-hash-locked nodes receive a PREPARE message after 1 or 2 COMMIT messages, they will achieve a consensus.
What is the problem?
The following diagram describes parts of Istanbul BFT code related to hash-lock. I think the red-colored part should exist, but it is missing in the current code. (I cannot find a document describing the consensus process for the hash-locked proposal)
In the normal IBFT consensus, a node changes its state step by step:
AcceptRequest
->Preprepared
->Prepared
->Committed
. If a validator receives a valid PREPREPARE message, its state is changed fromAcceptRequest
toPreparepared
. If a validator receives more than 2f PREPARE messages, its state is changed fromPreprepared
toPrepared
. If a validator receives more than 2f COMMIT messages, its state is changed fromPrepared
toCommitted
.When a hash-locked proposal is handled, however, hash-locked nodes skip
Prepared
state and sending a PREPARE message. Therefore, non-hash-locked nodes cannot receive more than 2f PREPARE messages when there are more than f hash-locked nodes exist. Fortunately, non-hash-locked nodes can change their state toPrepared
since they count COMMIT messages as well as PREPARE messages. However, the change is only triggered by PREPARE message. If they receive only COMMIT messages after 2f messages, they cannot change toPrepared
state even though 2f+1 Prepare or Commit messages are received.The problem can be solved if the red-colored logic in the following diagram is introduced.
Steps to reproduce the behaviour
To reproduce the hash-lock scenario, I used a code-patched version containing the following changes.