eBay / NuRaft

C++ implementation of Raft core logic as a replication library
Apache License 2.0
1.02k stars 240 forks source link

Please don't ignore election safety invariant #531

Closed andrewjstone closed 1 month ago

andrewjstone commented 2 months ago

Hi, I was reading through the code to see what optimizations were provided around determining the actual matching log index of a follower that may have more uncommitted log entries than a leader in a manner similar to etcd's implementation when I stumbled across this code.

This code ignores the fact that there are two leaders for the same term which violates the critical election safety property shown in figure 3 of the raft paper. If this occurs it means that 2 different leaders received a quorum of votes and that those two leaders could potentially commit different log entries to the same log index at different followers. While I have no reason to suspect that this bug actually exists in this code base, it seems to me that allowing the protocol to silently continue and possibly corrupt data indefinitely is worse than crashing and making the system unavailable. Personally as a user/operator I'd much rather have an opportunity to see and fix a serious bug while the system was down, then work with corrupted data and potentially perform incorrect operations unwittingly.

I'd strongly suggest panicing here, even in production. Ideally and most likely, this bug doesn't actually exist, and it won't ever be hit in testing or production. But if it is ever seen, someone will at least know it happened.

Thank you, Andrew

greensky00 commented 2 months ago

Thanks for bringing this issue. You're right, this can be a critical issue if it comes from a member in the same group, and it would be important for the application to have the option to be notified.

The reason why we (eBay) has ignored this is because of our deployment. We operate tens of thousands of Raft groups concurrently, and under certain conditions, a malfunctioning server might send heartbeats to members of a different Raft group. In such a case, what that Raft group should do is to ignore the message rather than to panic, as there is essentially no real issue.

Therefore, rather than mandating a panic, I will introduce a callback mechanism to allow the application to determine the appropriate action in these situations.

andrewjstone commented 2 months ago

Thanks for bringing this issue. You're right, this can be a critical issue if it comes from a member in the same group, and it would be important for the application to have the option to be notified.

The reason why we (eBay) has ignored this is because of our deployment. We operate tens of thousands of Raft groups concurrently, and under certain conditions, a malfunctioning server might send heartbeats to members of a different Raft group. In such a case, what that Raft group should do is to ignore the message rather than to panic, as there is essentially no real issue.

Gotcha. That makes sense. The way I've handled this in the past is to add a separate "cluster id" that gets sent with each message so that these can get discarded. That does add a touch of overhead though, and your solution works.

Therefore, rather than mandating a panic, I will introduce a callback mechanism to allow the application to determine the appropriate action in these situations.

Thank you! I think this will be sufficient. I'm wondering if instead of a callback this could maybe be a configuration instead though, since it should be a one time setting only.

greensky00 commented 1 month ago

I prefer callback, as it will be more flexible and extensible than config. Callback can do things differently according to the contents of the request.

The fix has been merged: https://github.com/eBay/NuRaft/pull/537

andrewjstone commented 1 month ago

Thank you for adding the callback.