Closed ppamorim closed 2 years ago
👋 Thanks for opening this issue!
Get help or engage by:
/help
: to print help messages./assignme
: to assign this issue to you.I believe the panic happens here: https://github.com/datafuselabs/openraft/blob/d62183517be4b6e38360cc107d10f2488123641b/openraft/src/raft.rs#L439
This is to check if replication to the newly added learner becomes up to date.
But here it tries to use matched.leader_id
to build a LogId, while matched
is an uninitialized value (0,0)
. In such a case, it should avoid creating a LogId but just compare the index instead.
I will remove this LogId::new()
to fix this issue.
Thank you man! 😆
When calling
app.raft.add_learner(node_id, Some(node), true)
the application crashes and a error happens.Payload:
POST - /add-learner [8117459876464397197, "0.0.0.0:8101"]
Error:
Full log: https://gist.github.com/ppamorim/520c04892db587790e07cadf2b30c6e4
Openraft is in sync with the main branch.
EDIT 1:
It seems to not be deterministic, if I call this a couple of seconds after the nodes start and attempt multiple times, the error doesn't happen.
This bug is visible on this branch: https://shorturl.at/cgOU0 (shorted to prevent indexing by search engines)
To reproduce: run
./script/start-cluster.sh
and checkn0.log
.EDIT 2:
By adding a
loop
inside the request to add the learner, it seems to sort the issue but it's a workaround. Feels like the version of openraft on themain
branch is not ready after its initial setup. Interesting.As: