Closed ntsh-gpt closed 3 years ago
Thanks, we'll investigate.
This looks like a bug that indeed should be investigated. However, it should be greatly mitigated if you implement a graceful shutdown in your code, by catching whatever signal you are sending to the process, then calling raft_close
and waiting for its callback to fire before exiting the process. That's exactly what is done in the example code.
@freeekanayaka I am following the same approach as mentioned in the example code. However I would like to point out exact case for the issue,
Working :
struct A { int64_t x; int64_t y; }; struct A data; raft_buffer buf; buf.len = sizeof(struct A); buf.base = raft_malloc(buf.len); // assign data values to buf.base raft_apply(&raft, req, &buf, 1, serverApplyCb)
Not working :
FILE* f = fopen("path"); buf.len = ftell(f) ; fseek(f, 0, SEEK_SET); buf.base = raft_malloc(buf.len); fread(buf.base,1,buf.len,f); fclose(f); raft_apply(&raft, req, &buf, 1, serverApplyCb)
Note : I tried memcpy a char array to buf.base, still same issue.
I believe I'm also hitting this bug. The idea of calling raft_close is useful, thanks, but this is a bit worrisome, and calls into question whether I should rely on this implementation. Raft is supposed to help with durability of log entries. That means that if my server dies due to power failure, I should be able to gracefully restore my machine. What is raft_close doing that can't be done after a raft_apply to make sure that the open segment can be read again? I'm happy to contribute to make things work, but I want to make sure that such a use case (power failure) is supposed to be supported.
I believe I'm also hitting this bug. The idea of calling raft_close is useful, thanks, but this is a bit worrisome, and calls into question whether I should rely on this implementation. Raft is supposed to help with durability of log entries. That means that if my server dies due to power failure, I should be able to gracefully restore my machine. What is raft_close doing that can't be done after a raft_apply to make sure that the open segment can be read again? I'm happy to contribute to make things work, but I want to make sure that such a use case (power failure) is supposed to be supported.
We do have Jepsen tests that simulate hard process crashes (SIGKILL) and recovery after that, probably we should run those tests more times and see if we can reproduce this particular bug (or just try what @ntsh-gpt as pointed).
As I said, this really looks like a bug that should be fixed, so the suggestion of using raft_close
is just a mitigation, I'm not meaning that it is supposed to be a required procedure in order for the implementation to work.
@MathieuBordere Will try out with #184.
@ntsh-gpt I deleted that comment, was premature, #184 won't solve it.
@ntsh-gpt I deleted that comment, was premature, #184 won't solve it.
Yeah, #184 should only be relevant for snapshots. Closed segments should alread be atomically created (by renaming an open segment).
@ntsh-gpt Are you hitting the same issue as in https://github.com/canonical/raft/issues/189, if yes, can I close?
Closing for inactivity, assuming that the fix above took care of it. If not, let us know and we'll re-open.
When I manually stop the raft server using interrupt signal, and then restart I see following error
raft_start(): io: load closed segment 0000000000000002-0000000000000013: entries batch 2 starting at byte 197: entries count 19005183 in preamble is too high.
base of raft_buffer passed in raft_apply() call is a dynamically allocated memory of size 157 bytes.
Any help or pointers are appreciated.