Closed eloff closed 7 years ago
it's up to the log store, in this implementation, fs_log_store flush the data every time it got changed, I understand that saving the state is one of the key for the implementations, however, fsync is not the only way, it's all up to the log store and state mgr implementation, you can even have your own device to hold the logs and state, we don't enforce any log store or state mgr implementation here
To be clear, I'm only talking about fs_log_store here. The state manager can do whatever it wants, and it doesn't really need fsync if it can restore from the log - in fact it could be memory only.
fs_log_store on the other hand does not call fsync, so it's no more durable or permanent than memory. The C++ stdlib flush() does not call fsync. Entries stored to fs_log_store can live in the OS buffer cache and storage device buffers for indeterminate amounts of time. On a hard system failure, you have no guarantee about what made it to disk and what did not. You cannot base any correct implementation of Raft on such a log. See https://stackoverflow.com/questions/23358324/raft-consensus-protocol-should-entries-be-durable-before-commiting for why log entries must be made durable on the storage device(s).
Okay, if it's about fs_log_store, I am not worry about it, there could be sophisticated implementation of it, you can start trying cornerstone with fs_log_store, but you can have your own Linux specific impl in your production. For this implementation, my original intent is to provide a core implementation of the algorithm, but ship with some sample interface implementations, but it's open to extended with more sophisticated implementations, again fs_log_store is not part of the core, it may have other issues not just fsync.
I can't imagine why one would provide a sample log store implementation that does not work. If you want to do that, it's your business, but please mark it with big red letters in the documentation DO NOT USE IN PRODUCTION, YOU WILL LOSE DATA. Or rename it to e.g. fs_eat_my_data_log_store(), because it's identical to running e.g. MySQL with libeatmydata to skip fysnc.
Or just add fsync() or equivalent to the flush method.
README.md updated
I had a look through the code, but no raft implementation could be correct if doesn't call fsync or equivalent at the appropriate places. Please forgive me if I missed it, but does this code call fsync?