cita-cloud / consensus_raft

The raft consensus component for CITA Cloud.
Apache License 2.0
3 stars 0 forks source link

remove multi valitors at once,chain will stop #97

Closed rink1969 closed 3 months ago

rink1969 commented 3 months ago

当一个交易同时移除多个共识节点的时候,因为raft会把这个操作拆分成多个configchange操作。 而被删除的节点收到reconfigure消息之后会马上把raft线程停掉,可能会导致这些configchange操作还没有共识完成,进而导致共识进行不下去。

rink1969 commented 3 months ago

解决方案是,发现本节点不是共识节点后不着急马上停掉raft线程,延迟一个区块再执行abort

rink1969 commented 3 months ago

修复之后的效果:

2024-03-18 04:11:32.904 | Mar 17 20:11:32.904 INFO abort raft, module: consensus:263 |  
-- | -- | --
  |   | 2024-03-18 04:11:32.904 | Mar 17 20:11:32.904 INFO I'm not in the validators list, module: consensus:255 |  
  |   | 2024-03-18 04:11:32.904 | Mar 17 20:11:32.904 INFO get reconfigure from controller, height is 168, module: consensus:223 |  
  |   | 2024-03-18 04:11:31.408 | Mar 17 20:11:31.408 INFO persisted snapshot index: 177, tag: storage, module: consensus::storage:274 |  
  |   | 2024-03-18 04:11:31.408 | Mar 17 20:11:31.407 INFO apply config change `ConfChangeV2 { transition: Auto, changes: [], context: [] }`; now config state is: ConfState { voters: [2615194328577306464], learners: [], voters_outgoing: [], learners_next: [], auto_leave: false }, module: consensus::peer:613 |  
  |   | 2024-03-18 04:11:31.408 | Mar 17 20:11:31.407 INFO switched to configuration, config: Configuration { voters: Configuration { incoming: Configuration { voters: {2615194328577306464} }, outgoing: Configuration { voters: {} } }, learners: {}, learners_next: {}, auto_leave: false }, raft_id: 6386566500878302331, module: raft::raft:2660 |  
  |   | 2024-03-18 04:11:29.903 | Mar 17 20:11:29.903 INFO I'm not in the validators list, module: consensus:255 |  
  |   | 2024-03-18 04:11:29.903 | Mar 17 20:11:29.903 INFO get reconfigure from controller, height is 167, module: consensus:223

原来是在04:11:29就abort raft了 少了后面的switched to configuration和apply config change