Consensys / quorum

A permissioned implementation of Ethereum supporting data privacy
https://www.goquorum.com/
GNU Lesser General Public License v3.0
4.68k stars 1.29k forks source link

Istanbul BFT could fail to achieve a consensus on a hash-locked proposal #1133

Open aidan-kwon opened 3 years ago

aidan-kwon commented 3 years ago

System information

Geth version: geth version

OS & Version: Windows/Linux/OSX

Branch, Commit Hash or Release: git status

Scenario triggering the issue

There are 4 validators. They failed to achieve a consensus in round 0 because of the harsh network environment. At the end of round 0, two of them locked the proposal receiving 4 PREPARE messages. (They will be called hash-locked node in the following description.) Other nodes didn't lock the proposal because they received only 2 PREPARE messages in the round for some reason. (They will be called non-hash-locked nodes in the following description.)

In round 1, a hash-locked proposer sends PREPREPARE message containing the hash-locked proposal to others. Handling PREPREPARE message, 2 hash-locked nodes set their state to Prepared and send COMMIT message sine the proposal was already prepared in round 0. The other nodes set their state to Preprepared and send PREPARE message. As a result, 2 PREPARE and 2 COMMIT messages are broadcasted on the network. (Consider the network is healthy now)

Expected behaviour

They should achieve a consensus because they all agreed on the proposal.

Actual behaviour

They could fail to achieve a consensus depends on the message order.

If non-hash-locked nodes receive 2 PREPARE messages before 2 COMMIT messages, they fail to achieve a consensus. If non-hash-locked nodes receive a PREPARE message after 1 or 2 COMMIT messages, they will achieve a consensus.

What is the problem?

The following diagram describes parts of Istanbul BFT code related to hash-lock. I think the red-colored part should exist, but it is missing in the current code. (I cannot find a document describing the consensus process for the hash-locked proposal)

In the normal IBFT consensus, a node changes its state step by step: AcceptRequest -> Preprepared -> Prepared -> Committed. If a validator receives a valid PREPREPARE message, its state is changed from AcceptRequest to Preparepared. If a validator receives more than 2f PREPARE messages, its state is changed from Preprepared to Prepared. If a validator receives more than 2f COMMIT messages, its state is changed from Prepared to Committed.

When a hash-locked proposal is handled, however, hash-locked nodes skip Prepared state and sending a PREPARE message. Therefore, non-hash-locked nodes cannot receive more than 2f PREPARE messages when there are more than f hash-locked nodes exist. Fortunately, non-hash-locked nodes can change their state to Prepared since they count COMMIT messages as well as PREPARE messages. However, the change is only triggered by PREPARE message. If they receive only COMMIT messages after 2f messages, they cannot change to Prepared state even though 2f+1 Prepare or Commit messages are received.

The problem can be solved if the red-colored logic in the following diagram is introduced.

image

Steps to reproduce the behaviour

To reproduce the hash-lock scenario, I used a code-patched version containing the following changes.

aidan-kwon commented 3 years ago

I also requested the change resolving the issue in https://github.com/ConsenSys/quorum/pull/1134. Klaytn is considering the same patch also in https://github.com/klaytn/klaytn/pull/888.

nmvalera commented 3 years ago

Thanks @aidan-kwon we are looking into this

nmvalera commented 3 years ago

@aidan-kwon Thanks a lot for your message and for raising this.

This is a known issue with the current IBFT implementation and they are actually other similar scenarios not covered by this fix.

As I write we are finalizing the implementation of a fresh new version of BFT consensus that will solve those issues (c.f. specs: https://github.com/ConsenSys/quorum-ibft)

For protocol implementation, you can refer to this branch: https://github.com/ConsenSys/quorum/tree/updated_ibft

About the fix you proposed, we prefer not to merge, so we do not expose to the possible introduction of some new subtle problems that could come along with the fix.

In the future, we will recommend you migrating your current IBFT network to the new version of BFT.

Thanks a lot.

aidan-kwon commented 3 years ago

@nmvalera Thank you for the kind comment. I will look at the new version of BFT. Thank you!