baidu / braft

An industrial-grade C++ implementation of RAFT consensus algorithm based on brpc, widely used inside Baidu to build highly-available distributed systems.
Apache License 2.0
3.99k stars 886 forks source link

raft arbiter support #321

Open dragonylffly opened 3 years ago

dragonylffly commented 3 years ago

This commit implements raft arbiter. An arbiter does not have a copy of data and only participates in elections. The concept is similar to MongoDB arbiter (https://docs.mongodb.com/manual/core/replica-set-arbiter). The motivation behind is for some cost-sensitive circumstances, three copies are not affordable, but an odd number of votes are needed to break a tie

dragonylffly commented 3 years ago

@PFZheng

PFZheng commented 3 years ago

这个机制看着和 witness 是一回事情,实现得有一些问题:

  1. patch 里区分 arbiter 的做法过于粗暴
  2. arbiter 不能简单的挡掉住所有的选主,这个节点有可能是一些日志的唯一副本(3副本里,arbiter数据比另外一个follower新),正确的做法是如果必须成为主之后走一轮 tranfer leader
  3. snapshot 要特殊处理,保留的 log 能连上所有有效节点的日志(避免成为主之后不能让日志保持连续造成数据丢失)

MongoDB 实现的不是标准 Raft,有一些设计不能原样照搬

dragonylffly commented 3 years ago

The current implementation seems to be the simplest way with minimum modification to braft to support two data copies, and it covers most situations. The only situation is when leader down, and the length of candidate's log is shorter than that of arbiter. In this case, the cluster will not have a leader elected. However, such situation (leader down and candidate lost data simultaneously) should rarely happen. In this case, the cluster manager needs to choose between consistency and availability. If consistency is preferred, the cluster needs to wait for original leader to restart. If availability, the manager could enable candidate elected by removing arbiter from the cluster, which can be automatically done by a script. If arbiter is allowed to be elected as leader (temporarily), careful and tricky modification to braft are needed, which will be error prone. In addtion, since arbiter does not have snapshot (otherwise against two data copies premise). The situation it can improve is only limited to that the data lost by candidate is covered in the arbiter log

dragonylffly commented 3 years ago

@PFZheng

fyyfyx commented 3 years ago

@PFZheng witness特性有合入计划吗

GOGOYAO commented 2 years ago

这个有进展么