eBay / NuRaft

C++ implementation of Raft core logic as a replication library
Apache License 2.0
1.02k stars 240 forks source link

Use `pthread_exit` instead of `exit` to allow proper process shutdown. #535

Closed szmyd closed 2 months ago

szmyd commented 2 months ago

When NuRaft encounters an unrecoverable state it calls exit(2) directly, such as:

 // handle_commit.cxx
 160 bool raft_server::commit_in_bg_exec(size_t timeout_ms) {
 ...
 225         ptr<log_entry> le = log_store_->entry_at(index_to_commit);
 226         if (!le)
 227         {
 228             // LCOV_EXCL_START
 229             p_ft( "failed to get log entry with idx %" PRIu64 "", index_to_commit );
 230             ctx_->state_mgr_->system_exit(raft_err::N19_bad_log_idx_for_term);
 231             ::exit(-1);
 232             // LCOV_EXCL_STOP
 233         }

commit_bg_exec runs in a separate thread from the main process which could also have other threads or even other nuraft::raft_server's running. This forceful termination does not allow the process to try and enter a "non-random" state and behaves as if every thread abnormally terminated.

Requested is that pthread_exit(3) be used instead after a call to state_mgr::system_exit to announce that the server is going to stop processing. The main process can decide whether to also terminate and join all other threads normally, continue after abandoning this raft service or abend itself.