Consensys / quorum

A permissioned implementation of Ethereum supporting data privacy
https://www.goquorum.com/
GNU Lesser General Public License v3.0
4.68k stars 1.29k forks source link

Chain Stopped Producing Blocks when Validator was Removed #1728

Open yohanelly95 opened 2 months ago

yohanelly95 commented 2 months ago

System information

Geth version: v1.10.3 Quorum version: v24.4.1 OS & Version: Linux

Expected behaviour

I used a simple contract that adds and removes validators from a list. The quorum chain uses this contract to getValidators().

I added a new validator to the chain successfully ie, now the chain has 6 Validators (of a minimum requirement of 4). When I removed the recently added validator from the contract the chain should continue to produce blocks as long as the minimum number of validators are running and proposing blocks.

Actual behaviour

After removing the validator, ie, 5 active validators on the chain. The chain stopped producing blocks completely and cannot recover even though I have 5 validators running.

Attaching logs of errors and warnings that are shown in the node logs.

ERROR[08-30|07:54:37.042] QBFT: invalid PREPARE message digest     address=0x0D638cdc26D8AE3325bF4EBa49992e28b0f4Af9B current.round=4 current.sequence=11484 state=Preprepared      msg.code=19 msg.source=0x6F1013A68bcfc9dA74FB83408CA7Db32e7731D75 msg.round=4 msg.sequence=11484
INFO [08-30|07:54:37.036] QBFT: Correctly decoded SignedRoundChangePayload p="RoundChange {seq=11484, round=4, pr=<nil>, pv=0x0000000000000000000000000000000000000000000000000000000000000000}"
INFO [08-30|07:54:37.036] QBFT: Correctly decoded SignedRoundChangePayload p="RoundChange {seq=11484, round=4, pr=<nil>, pv=0x0000000000000000000000000000000000000000000000000000000000000000}"
INFO [08-30|07:54:37.036] QBFT: Correctly decoded SignedRoundChangePayload p="RoundChange {seq=11484, round=4, pr=<nil>, pv=0x0000000000000000000000000000000000000000000000000000000000000000}"
INFO [08-30|07:54:37.036] QBFT: Correctly decoded SignedRoundChangePayload p="RoundChange {seq=11484, round=4, pr=<nil>, pv=0x0000000000000000000000000000000000000000000000000000000000000000}"
ERROR[08-30|07:54:37.036] QBFT: Error List() Signed Payload        err="rlp: end of list"
TRACE[08-30|07:54:37.037] QBFT: confirmation Formula used ceil(2N/3) address=0x0D638cdc26D8AE3325bF4EBa49992e28b0f4Af9B current.round=4 current.sequence=11484 state=Preprepared
INFO [08-30|07:54:37.037] QBFT: handle PREPARE message             address=0x0D638cdc26D8AE3325bF4EBa49992e28b0f4Af9B current.round=4 current.sequence=11484 state=Preprepared      msg.code=19 msg.source=0xc16E2a9092e92210cE55F05023D1c45d01689037 msg.round=4 msg.sequen
ce=11484 prepares.count=0 quorum=4
ERROR[08-30|07:54:37.037] QBFT: invalid PREPARE message digest     address=0x0D638cdc26D8AE3325bF4EBa49992e28b0f4Af9B current.round=4 current.sequence=11484 state=Preprepared      msg.code=19 msg.source=0xc16E2a9092e92210cE55F05023D1c45d01689037 msg.round=4 msg.sequen
ce=11484
DEBUG[08-30|07:54:37.038] consensus message was handled by consensus engine id=9c30927cd6e5da5a conn=inbound            msg=19 quorumConsensusProtocolName=istanbul err=nil
DEBUG[08-30|07:54:37.039] consensus message was handled by consensus engine id=04d9d8ed628abf20 conn=inbound            msg=19 quorumConsensusProtocolName=istanbul err=nil
DEBUG[08-30|07:54:37.039] consensus message was handled by consensus engine id=19bb06c402bc4098 conn=inbound            msg=19 quorumConsensusProtocolName=istanbul err=nil
DEBUG[08-30|07:54:37.039] consensus message was handled by consensus engine id=19bb06c402bc4098 conn=inbound            msg=19 quorumConsensusProtocolName=istanbul err=nil
TRACE[08-30|07:54:37.040] QBFT: confirmation Formula used ceil(2N/3) address=0x0D638cdc26D8AE3325bF4EBa49992e28b0f4Af9B current.round=4 current.sequence=11484 state=Preprepared
INFO [08-30|07:54:37.040] QBFT: handle PREPARE message             address=0x0D638cdc26D8AE3325bF4EBa49992e28b0f4Af9B current.round=4 current.sequence=11484 state=Preprepared      msg.code=19 msg.source=0xFDd781c436c2FB360F7458AF64ACEAF5Ee13e6e5 msg.round=4 msg.sequen
ce=11484 prepares.count=0 quorum=4
TRACE[08-30|07:54:37.040] QBFT: confirmation Formula used ceil(2N/3) address=0x0D638cdc26D8AE3325bF4EBa49992e28b0f4Af9B current.round=4 current.sequence=11484 state=Preprepared
TRACE[08-30|07:54:37.040] QBFT: confirmation Formula used ceil(2N/3) address=0x0D638cdc26D8AE3325bF4EBa49992e28b0f4Af9B current.round=4 current.sequence=11484 state=Preprepared
DEBUG[08-30|07:54:37.040] QBFT: accepted PREPARE messages          address=0x0D638cdc26D8AE3325bF4EBa49992e28b0f4Af9B current.round=4 current.sequence=11484 state=Preprepared      msg.code=19 msg.source=0xFDd781c436c2FB360F7458AF64ACEAF5Ee13e6e5 msg.round=4 msg.sequen
ce=11484 prepares.count=1 quorum=4
DEBUG[08-30|07:54:37.040] consensus message was handled by consensus engine id=9c30927cd6e5da5a conn=inbound            msg=19 quorumConsensusProtocolName=istanbul err=nil
TRACE[08-30|07:54:37.041] QBFT: confirmation Formula used ceil(2N/3) address=0x0D638cdc26D8AE3325bF4EBa49992e28b0f4Af9B current.round=4 current.sequence=11484 state=Preprepared
INFO [08-30|07:54:37.041] QBFT: handle PREPARE message             address=0x0D638cdc26D8AE3325bF4EBa49992e28b0f4Af9B current.round=4 current.sequence=11484 state=Preprepared      msg.code=19 msg.source=0xDa06ecA6f65f05b573820BF781819dC4D41f6a3d msg.round=4 msg.sequen
ce=11484 prepares.count=1 quorum=4
ERROR[08-30|07:54:37.041] QBFT: invalid PREPARE message digest     address=0x0D638cdc26D8AE3325bF4EBa49992e28b0f4Af9B current.round=4 current.sequence=11484 state=Preprepared      msg.code=19 msg.source=0xDa06ecA6f65f05b573820BF781819dC4D41f6a3d msg.round=4 msg.sequen
ce=11484
DEBUG[08-30|07:54:37.041] consensus message was handled by consensus engine id=19bb06c402bc4098 conn=inbound            msg=19 quorumConsensusProtocolName=istanbul err=nil
TRACE[08-30|07:54:37.041] QBFT: confirmation Formula used ceil(2N/3) address=0x0D638cdc26D8AE3325bF4EBa49992e28b0f4Af9B current.round=4 current.sequence=11484 state=Preprepared
INFO [08-30|07:54:37.041] QBFT: handle PREPARE message             address=0x0D638cdc26D8AE3325bF4EBa49992e28b0f4Af9B current.round=4 current.sequence=11484 state=Preprepared      msg.code=19 msg.source=0x0D638cdc26D8AE3325bF4EBa49992e28b0f4Af9B msg.round=4 msg.sequen
ce=11484 prepares.count=1 quorum=4
TRACE[08-30|07:54:37.041] QBFT: confirmation Formula used ceil(2N/3) address=0x0D638cdc26D8AE3325bF4EBa49992e28b0f4Af9B current.round=4 current.sequence=11484 state=Preprepared
TRACE[08-30|07:54:37.041] QBFT: confirmation Formula used ceil(2N/3) address=0x0D638cdc26D8AE3325bF4EBa49992e28b0f4Af9B current.round=4 current.sequence=11484 state=Preprepared
DEBUG[08-30|07:54:37.041] QBFT: accepted PREPARE messages          address=0x0D638cdc26D8AE3325bF4EBa49992e28b0f4Af9B current.round=4 current.sequence=11484 state=Preprepared      msg.code=19 msg.source=0x0D638cdc26D8AE3325bF4EBa49992e28b0f4Af9B msg.round=4 msg.sequen
ce=11484 prepares.count=2 quorum=4
TRACE[08-30|07:54:37.042] QBFT: confirmation Formula used ceil(2N/3) address=0x0D638cdc26D8AE3325bF4EBa49992e28b0f4Af9B current.round=4 current.sequence=11484 state=Preprepared
INFO [08-30|07:54:37.042] QBFT: handle PREPARE message             address=0x0D638cdc26D8AE3325bF4EBa49992e28b0f4Af9B current.round=4 current.sequence=11484 state=Preprepared      msg.code=19 msg.source=0x6F1013A68bcfc9dA74FB83408CA7Db32e7731D75 msg.round=4 msg.sequen
ce=11484 prepares.count=2 quorum=4
ERROR[08-30|07:54:37.042] QBFT: invalid PREPARE message digest     address=0x0D638cdc26D8AE3325bF4EBa49992e28b0f4Af9B current.round=4 current.sequence=11484 state=Preprepared      msg.code=19 msg.source=0x6F1013A68bcfc9dA74FB83408CA7Db32e7731D75 msg.round=4 msg.sequen
ce=11484

and finally

WARN [08-30|08:02:37.046] QBFT: ignore PRE-PREPARE message from non proposer address=0x0D638cdc26D8AE3325bF4EBa49992e28b0f4Af9B current.round=6 current.sequence=11484 state="Accept request" msg.code=18 msg.source=0xDa06ecA6f65f05b573820BF781819dC4D41f6a3d msg.round=6 msg.sequence=11484 proposal.number=11484 proposal.hash=0x836e3924a7dc9c97af4c523055ff8bb6d52331fff0bc702232f92dce15ce00b2 proposer=0x8522537600244d9d45C39947191a1Eec1fB19A70

The validator that was removed was validator 0x8522537600244d9d45C39947191a1Eec1fB19A70 which was the one that was to propose the current block/proposed last block. I am not sure how to handle this when relying on smart contracts for validator selection.

Steps to reproduce the behaviour

Use a smart contract to add/remove validators. Add a new validator and after it produces some blocks remove the validator.

yohanelly95 commented 2 months ago

@rodion-lim-partior I also tried first stopping the new validator node (making sure they are not the block proposer of this round) and then removing them from the smart contract. This did not stop the chain and worked as expected. But in a real-world scenario, we should just be able to remove a validator from the smart contract as long as we have >=4 validators the chain should continue to produce blocks.

What am I missing here?

yohanelly95 commented 2 months ago

On a new quorum chain, I added a new validator to the smart contract and started the node called Node-5. I kept the validator in the validator set, but stopped running the node. When trying to re-run the node it never syncs to the latest block and has the following errors in the logs:

ERROR[09-02|09:17:43.107] BFT: header author is not a validator    snap.number=25600 snap.hash=0xe9e0b055bf6ed520882297c9ef4ce63b9cc71d204d9db304c9c00f9f436f6d98 snap.epoch=30000 snap.validators="[0x0D638cdc26D8AE3325bF4EBa49992e28b0f4Af9B 0x6F1013A68bcfc9dA74[0/1896]
A7Db32e7731D75 0xc16E2a9092e92210cE55F05023D1c45d01689037 0xDa06ecA6f65f05b573820BF781819dC4D41f6a3d 0xFDd781c436c2FB360F7458AF64ACEAF5Ee13e6e5]" snap.votes=[] header.number=25969 header.hash=0x3da41b809a4bee8dd8ff7db9a4c0d5956762362bc19206dbecf4fd2c6496df70 header.au
thor=0x8522537600244d9d45C39947191a1Eec1fB19A70 Validators="&{validators:[0xc0036d5170 0xc0036d5188 0xc0036d51a0 0xc0036d51b8 0xc0036d51d0] policy:0xc005713c50 proposer:0xc0036d5170 validatorMu:{w:{state:0 sema:0} writerSem:0 readerSem:0 readerCount:{_:{} v:0} readerW
ait:{_:{} v:0}} selector:0xe07f00}" Author=0x8522537600244d9d45C39947191a1Eec1fB19A70
ERROR[09-02|09:17:43.107] Failed to prepare header for mining      err=unauthorized
DEBUG[09-02|09:17:46.248] Recalculated downloader QoS values       rtt=20s confidence=0.748 ttl=1m0s
DEBUG[09-02|09:17:47.002] consensus message was handled by consensus engine id=2ce0a77723e47249 conn=staticdial         msg=18 quorumConsensusProtocolName=istanbul err=nil
ERROR[09-02|09:17:47.002] QBFT: invalid message signature          address=0x8522537600244d9d45C39947191a1Eec1fB19A70 current.round=0 current.sequence=26010 state="Accept request" msg.code=18 msg.source=0x0000000000000000000000000000000000000000 msg.round=0 msg.sequen
ce=26094 err="unauthorized address"
DEBUG[09-02|09:17:47.005] consensus message was handled by consensus engine id=2ce0a77723e47249 conn=staticdial         msg=19 quorumConsensusProtocolName=istanbul err=nil
ERROR[09-02|09:17:47.005] QBFT: invalid message signature          address=0x8522537600244d9d45C39947191a1Eec1fB19A70 current.round=0 current.sequence=26010 state="Accept request" msg.code=19 msg.source=0x0000000000000000000000000000000000000000 msg.round=0 msg.sequen
ce=26094 err="unauthorized address"
DEBUG[09-02|09:17:47.007] consensus message was handled by consensus engine id=2ce0a77723e47249 conn=staticdial         msg=19 quorumConsensusProtocolName=istanbul err=nil
ERROR[09-02|09:17:47.007] QBFT: invalid message signature          address=0x8522537600244d9d45C39947191a1Eec1fB19A70 current.round=0 current.sequence=26010 state="Accept request" msg.code=19 msg.source=0x0000000000000000000000000000000000000000 msg.round=0 msg.sequen
ce=26094 err="unauthorized address"
DEBUG[09-02|09:17:47.007] consensus message was handled by consensus engine id=04d9d8ed628abf20 conn=staticdial         msg=19 quorumConsensusProtocolName=istanbul err=nil
ERROR[09-02|09:17:47.007] QBFT: invalid message signature          address=0x8522537600244d9d45C39947191a1Eec1fB19A70 current.round=0 current.sequence=26010 state="Accept request" msg.code=19 msg.source=0x0000000000000000000000000000000000000000 msg.round=0 msg.sequen
ce=26094 err="unauthorized address"
DEBUG[09-02|09:17:47.007] consensus message was handled by consensus engine id=04d9d8ed628abf20 conn=staticdial         msg=18 quorumConsensusProtocolName=istanbul err=nil
DEBUG[09-02|09:17:47.007] consensus message was handled by consensus engine id=9c30927cd6e5da5a conn=staticdial         msg=19 quorumConsensusProtocolName=istanbul err=nil
DEBUG[09-02|09:17:47.007] consensus message was handled by consensus engine id=9c30927cd6e5da5a conn=staticdial         msg=19 quorumConsensusProtocolName=istanbul err=nil
DEBUG[09-02|09:17:47.007] consensus message was handled by consensus engine id=04d9d8ed628abf20 conn=staticdial         msg=19 quorumConsensusProtocolName=istanbul err=nil
DEBUG[09-02|09:17:47.009] consensus message was handled by consensus engine id=2ce0a77723e47249 conn=staticdial         msg=19 quorumConsensusProtocolName=istanbul err=nil
DEBUG[09-02|09:17:47.009] consensus message was handled by consensus engine id=04d9d8ed628abf20 conn=staticdial         msg=19 quorumConsensusProtocolName=istanbul err=nil
DEBUG[09-02|09:17:47.009] consensus message was handled by consensus engine id=9c30927cd6e5da5a conn=staticdial         msg=19 quorumConsensusProtocolName=istanbul err=nil
DEBUG[09-02|09:17:47.009] consensus message was handled by consensus engine id=9c30927cd6e5da5a conn=staticdial         msg=18 quorumConsensusProtocolName=istanbul err=nil
DEBUG[09-02|09:17:47.010] consensus message was handled by consensus engine id=149229e9db0fe5d1 conn=staticdial         msg=19 quorumConsensusProtocolName=istanbul err=nil
ERROR[09-02|09:17:47.010] QBFT: invalid message signature          address=0x8522537600244d9d45C39947191a1Eec1fB19A70 current.round=0 current.sequence=26010 state="Accept request" msg.code=19 msg.source=0x0000000000000000000000000000000000000000 msg.round=0 msg.sequen
ce=26094 err="unauthorized address"
DEBUG[09-02|09:17:47.010] consensus message was handled by consensus engine id=149229e9db0fe5d1 conn=staticdial         msg=18 quorumConsensusProtocolName=istanbul err=nil
DEBUG[09-02|09:17:47.011] consensus message was handled by consensus engine id=149229e9db0fe5d1 conn=staticdial         msg=19 quorumConsensusProtocolName=istanbul err=nil
DEBUG[09-02|09:17:47.011] consensus message was handled by consensus engine id=149229e9db0fe5d1 conn=staticdial         msg=19 quorumConsensusProtocolName=istanbul err=nil
DEBUG[09-02|09:17:47.012] consensus message was handled by consensus engine id=9c30927cd6e5da5a conn=staticdial         msg=20 quorumConsensusProtocolName=istanbul err=nil
ERROR[09-02|09:17:47.012] QBFT: invalid message signature          address=0x8522537600244d9d45C39947191a1Eec1fB19A70 current.round=0 current.sequence=26010 state="Accept request" msg.code=20 msg.source=0x0000000000000000000000000000000000000000 msg.round=0 msg.sequen
ce=26094 err="unauthorized address"
DEBUG[09-02|09:17:47.013] consensus message was handled by consensus engine id=2ce0a77723e47249 conn=staticdial         msg=19 quorumConsensusProtocolName=istanbul err=nil
DEBUG[09-02|09:17:47.013] consensus message was handled by consensus engine id=9c30927cd6e5da5a conn=staticdial         msg=19 quorumConsensusProtocolName=istanbul err=nil
DEBUG[09-02|09:17:47.015] consensus message was handled by consensus engine id=2ce0a77723e47249 conn=staticdial         msg=20 quorumConsensusProtocolName=istanbul err=nil
ERROR[09-02|09:17:47.015] QBFT: invalid message signature          address=0x8522537600244d9d45C39947191a1Eec1fB19A70 current.round=0 current.sequence=26010 state="Accept request" msg.code=20 msg.source=0x0000000000000000000000000000000000000000 msg.round=0 msg.sequen
ce=26094 err="unauthorized address"
DEBUG[09-02|09:17:47.016] consensus message was handled by consensus engine id=2ce0a77723e47249 conn=staticdial         msg=20 quorumConsensusProtocolName=istanbul err=nil
DEBUG[09-02|09:17:47.016] consensus message was handled by consensus engine id=04d9d8ed628abf20 conn=staticdial         msg=20 quorumConsensusProtocolName=istanbul err=nil
ERROR[09-02|09:17:47.017] QBFT: invalid message signature          address=0x8522537600244d9d45C39947191a1Eec1fB19A70 current.round=0 current.sequence=26010 state="Accept request" msg.code=20 msg.source=0x0000000000000000000000000000000000000000 msg.round=0 msg.sequen
ce=26094 err="unauthorized address"
DEBUG[09-02|09:17:47.017] consensus message was handled by consensus engine id=149229e9db0fe5d1 conn=staticdial         msg=20 quorumConsensusProtocolName=istanbul err=nil
DEBUG[09-02|09:17:47.017] consensus message was handled by consensus engine id=149229e9db0fe5d1 conn=staticdial         msg=19 quorumConsensusProtocolName=istanbul err=nil
ERROR[09-02|09:17:47.018] QBFT: invalid message signature          address=0x8522537600244d9d45C39947191a1Eec1fB19A70 current.round=0 current.sequence=26010 state="Accept request" msg.code=20 msg.source=0x0000000000000000000000000000000000000000 msg.round=0 msg.sequen
ce=26094 err="unauthorized address"
DEBUG[09-02|09:17:47.018] consensus message was handled by consensus engine id=04d9d8ed628abf20 conn=staticdial         msg=19 quorumConsensusProtocolName=istanbul err=nil
DEBUG[09-02|09:17:47.020] consensus message was handled by consensus engine id=149229e9db0fe5d1 conn=staticdial         msg=20 quorumConsensusProtocolName=istanbul err=nil
DEBUG[09-02|09:17:47.021] consensus message was handled by consensus engine id=04d9d8ed628abf20 conn=staticdial         msg=20 quorumConsensusProtocolName=istanbul err=nil
DEBUG[09-02|09:17:47.021] consensus message was handled by consensus engine id=149229e9db0fe5d1 conn=staticdial         msg=20 quorumConsensusProtocolName=istanbul err=nil
DEBUG[09-02|09:17:47.022] consensus message was handled by consensus engine id=04d9d8ed628abf20 conn=staticdial         msg=20 quorumConsensusProtocolName=istanbul err=nil
DEBUG[09-02|09:17:47.023] consensus message was handled by consensus engine id=9c30927cd6e5da5a conn=staticdial         msg=20 quorumConsensusProtocolName=istanbul err=nil
DEBUG[09-02|09:17:47.024] consensus message was handled by consensus engine id=9c30927cd6e5da5a conn=staticdial         msg=20 quorumConsensusProtocolName=istanbul err=nil
DEBUG[09-02|09:17:47.024] consensus message was handled by consensus engine id=149229e9db0fe5d1 conn=staticdial         msg=20 quorumConsensusProtocolName=istanbul err=nil
DEBUG[09-02|09:17:47.025] consensus message was handled by consensus engine id=2ce0a77723e47249 conn=staticdial         msg=20 quorumConsensusProtocolName=istanbul err=nil
DEBUG[09-02|09:17:47.026] consensus message was handled by consensus engine id=9c30927cd6e5da5a conn=staticdial         msg=20 quorumConsensusProtocolName=istanbul err=nil
ERROR[09-02|09:17:47.026] QBFT: invalid message signature          address=0x8522537600244d9d45C39947191a1Eec1fB19A70 current.round=0 current.sequence=26010 state="Accept request" msg.code=20 msg.source=0x0000000000000000000000000000000000000000 msg.round=0 msg.sequen
ce=26094 err="unauthorized address"

this is in spite of the fact that Node-5 is part of the current validator set. But once the node is stopped and restarted it is never able to join the network again.

To resolve this:

I had to delete the chaindata and lightchaindata and resync the node from genesis.

yohanelly95 commented 2 months ago

Experiment 3:

Added a validator to the validator set (smart contract). Then let is propose a few blocks. Removed the validator from the smart contract when I was sure next.isProposer=false. The chain continued producing blocks and the removed validator Node-5 did not show as part of the current validator set nor was it eligible for block rewards but it was still updated with the latest block ie, non-validator node.

My question is, how do we handle adding/removing a validator using a smart contract regardless if they are the next block proposer/current block proposer? Since a validator for eg can join the network by staking X amount, and leave the network by unstaking X amount. But if they unstake and they are the block proposer, the entire chain halts and does not produce any more blocks. @rodion-lim-partior any insights here?