eBay / NuRaft

C++ implementation of Raft core logic as a replication library
Apache License 2.0
993 stars 236 forks source link

Out of order call to state_machine::create_snapshot() when manually triggering a snapshot #478

Open adotsch opened 9 months ago

adotsch commented 9 months ago

Hi,

I only create manual snapshots in my application calling raft_server::create_snapshot(). I have seen state_machine::create_snapshot() called with a snapshot index that is smaller(!) than the last committed index, i.e. when the state machine was already ahead of the point it should have created a snapshot for. I guess this is a bug. I have my workarounds but it would be nice to have this fixed so that I can clean up my code.

Regards, Andras

greensky00 commented 9 months ago

Hi @adotsch, Thanks for bringing this issue.

Creating snapshot is done by a separate background commit thread according to the current "state machine's commit index", so the snapshot index can be lagging behind the last commit index of log store. State machine is always catching up the Raft's commit index.

Nonetheless, the current behavior is obviously a bug. If there is more recent snapshot that is manually created, auto creation should skip any log index smaller than that. I will provide the fix for it soon.

adotsch commented 9 months ago

Correct me if I am wrong, but I think you just added an extra check to ignore manual create_snapshot() calls when inconvenient, i.e. create_snapshot() will work when we are lucky, but won't be reliable. Could you rather trigger an event in the commit thread to create a snapshot correctly and reliably or just use a new/existing mutex to avoid the race condition?

greensky00 commented 8 months ago

@adotsch create_snapshot() is a best effort API and does not guarantee the success of creation, as multi-thread racing is not the only cause of failure. Also we can't use a mutex as snapshot creation is asynchronous task. Basically applications need to do retry upon the result of the API call.

If what you want is to schedule the snapshot creation on next earliest available log index, then I'd rather add a new API, for instance schedule_snapshot_creation(). But still, being a universal library, below case may happen:

Please let me know if it is ok for your case.

adotsch commented 8 months ago

Thanks for your response! This is exactly what I do, I retry creating the snapshot if it failed. It just doesn't feel right. Yes, I think there should be a schedule_snapshot_creation() API that creates a snapshot at the earliest possible index. I don't think create_snapshot() in it's current form is a very good API, it requires a lot of logic at the user's end to get it working due to it's (best effort) nature.