etcd-io / raft

Raft library for maintaining a replicated state machine
Apache License 2.0
630 stars 160 forks source link

Discussion: follower panics due to a regression in the commit index. #197

Open shaj13 opened 4 months ago

shaj13 commented 4 months ago

It seems there is some confusion about why PR #25 was introduced. To clarify, I opened it for discussion rather than as a final solution.

The issue arises when a follower loses its state, such as due to a disk failure, and is restarted with a fresh state and index 0, while the leader continues sending the last index it captured the follower at, say index 10. This causes a follower to panic https://github.com/etcd-io/raft/blob/main/log.go#L320. In such cases, the only current resolution is a manual intervention to remove and re-add the follower to the cluster as a new member.

However, A potential solution is for the follower to reject a heartbeat with a higher index, providing a hint index instead. The leader can then decrease the progress, and the system can resume as expected. This approach has already been implemented for msgApp https://github.com/etcd-io/raft/blob/main/raft.go#L1382.

The event log that suggests at https://github.com/etcd-io/raft/pull/25#issuecomment-1449055381 is useful for other use cases where this panic can occur, during other operations. However, this does not resolve the issue, as the user is unable to reconcile the follower progress on the leader node. The optimal solution is to use a rejection hint, similar to what is done in msgapp.

The same approach as in #25 can be taken, but instead of recovering, return an error that causes a panic when needed https://github.com/etcd-io/raft/blob/main/log.go#L320. When handling the heartbeat, capture the error and check its reason. Based on that, handle the error by sending a rejection. If the error is not handled, it will panic. This approach is more idiomatic, as it reuses Go's error handling. In the future, it can adopt the event log for handling other operations when they panic.

That solves the regression and allows etcd raft to run in memory without a durable state. This is useful for applications that, for example, only need a replicated raft log-in memory, like Docker Swarm Secrets, where members can restart and follow the leader again to replicate encryption or security data that are never written to disk.

cc: @ahrtr @pav-kv