Closed kishorekrd closed 2 years ago
Hi @kishorekrd
Unfortunately, our log store implementation is not open-sourced. We are using Jungle to implement log store, and you can refer to the below example: https://github.com/eBay/Jungle/blob/master/examples/example_log_store_mode.cc
Hi, @greensky00 Does that mean, the current implementation of NuRaft on github does not include log persistence (writing to disk)? [Because Raft requires log persistence in case of power failure. ]
I notice there is a logger class that can call "write" to do write disk.
@Steamgjk
You can implement your log_store
that does fsync
for each log append. We merely provide an example of log store that is not durable.
logger
is not a Raft log store -- it is for debugging log.
Hi @greensky00, Now I am writing the append log record to disk in the "ulong append(ptr
In this case, If system crashes I will see that 10 log records in the backup. How to know that 10th log record is not committed? What are the other information I need to persist in the append log record's write to bring back the system to same state as before the crash. In the example state_mgr.h, I see that calls like save_config() , save_state() to write to disk. When do Raft calls this methods?
Hi @greensky00, Now I am writing the append log record to disk in the "ulong append(ptr
& entry);" call. Here is a scenario 1. Append persisted 10 log records 2. Commit got 9 log records 3. 9 log records are processed
In this case, If system crashes I will see that 10 log records in the backup. How to know that 10th log record is not committed? What are the other information I need to persist in the append log record's write to bring back the system to same state as before the crash. In the example state_mgr.h, I see that calls like save_config() , save_state() to write to disk. When do Raft calls this methods?
I am not the contributor to NuRaft, so my understanding may be wrong (to be confirmed by @greensky00 )
@kishorekrd The last committed index should be persisted in the state machine, and should be retrieved via this API: https://github.com/eBay/NuRaft/blob/789cc75869a6914d4c13aab6c2d5b48dba198f68/include/libnuraft/state_machine.hxx#L273-L283 Or if you really want to know the committed index at the moment the server receives the request, you may use this callback function to persist it: https://github.com/eBay/NuRaft/blob/789cc75869a6914d4c13aab6c2d5b48dba198f68/src/handle_append_entries.cxx#L600-L601 But I wonder why you are taking care of the last committed index. It is natural that the log store can contain uncommitted logs at the end, and they will be soon committed or discarded after the first communication with the existing leader, as @Steamgjk mentioned.
And please note that it is preferred to call fsync
in end_of_append_batch
API,
https://github.com/eBay/NuRaft/blob/789cc75869a6914d4c13aab6c2d5b48dba198f68/include/libnuraft/log_store.hxx#L79-L86
instead of calling it in each append
as it will be very inefficient because NuRaft sends multiple logs in batch.
Hi greensky00,
From my example 10 log records are appended, but 9 log records are committed. That means I processed 9 committed log records in my state machine. Now system crashed/rebooted. Now at the time of the recovery, I will see 10 log records in the append log backup. For recreating my state machine, I will first process the latest snapshot and then have to process the log records from the append backup. If I don't know up to what index the log records are committed in the previous session, how can I restore the state machine to the same state as before the crash/reboot?
What is the difference between flush() and end_of_append_batch()?
Thanks
For recreating my state machine, I will first process the latest snapshot and then have to process the log records from the append backup. If I don't know up to what index the log records are committed in the previous session, how can I restore the state machine to the same state as before the crash/reboot?
As I mentioned above, your state machine should remember the last committed index, and return it via state_machine::last_commit_index()
API call.
And also, is there any reason why you do this by yourself? As long as state_machine::last_commit_index()
returns the correct index, you don't need to care about this; NuRaft does this automatically.
flush()
: invoked when an explicit fsync
call is needed (right after membership change log is genearted).end_of_append_batch()
: invoked after each append_entries
request. You can call flush()
here for persistency.Hi @greensky00 , sorry, May be I am missing some details here. After system reboot, at recovery time, I am thinking that I need to restore state_machine::last_commit_index(), so that raft will treat the remaining log entries in the log store as uncommitted. Snapshot will have the last_commit_index at the time of the snapshot creation. log store will have all the log entries with their index number. But how and where to recover last_commit_index at last reboot time? Do I need to write it to disk for every commit? Currently I am writing only snapshot and log store to disk.
@kishorekrd
During the NuRaft initialization, last_commit_index
should not be the last index number right before the reboot. it should be the last applied index of the current state machine so that Raft can replay the log from the "last state" of the state machine.
Since you sync the data to disk for every snapshot creation, when NuRaft restarts, the last state of the state machine is the snapshot, hence you should replay the log from the index of the snapshot. Then, the first state_machine::last_commit_index()
call should return the index that snapshot has.
For example, let's say we create a snapshot for every 3 log-append.
commit log 1 -> state machine: log 1 / snapshot: empty
commit log 2 -> state machine: log 2 / snapshot: empty
commit log 3 -> state machine: log 3 / snapshot: log 3
commit log 4 -> state machine: log 4 / snapshot: log 3
--- crash & restart ---
After restart, the state machine: log 4
will be lost, you will load snapshot: log 3
, thus the current state machine is state machine: log 3
. In such a case, your last_commit_index
should be 3.
In clickhouse, there is a persistent log store implementation, https://github.com/ClickHouse/ClickHouse/blob/master/src/Coordination/KeeperLogStore.h
Hi,
Is there any reference implementation for Persistent log store with NuRaft with writing log records and snapshots to disk?
Thanks