(DotNext.Net.Cluster) Leader does not receive IRaftCluster.LeaderChanged event when downgrading to follower

RyanTT commented 2 years ago

Hello,

I am currently trying to run the 4.2.0-beta.1 version and I have come across a specific issue regarding the log entry writing with Raft and the events associated with it

Have a single node start a standalone cluster (first node)
Add a second member with AddMember (second node)
Kill the process of node 2
Start the process of node 2 again

After step 3 and inspecting my logs, it seems like the leader steps down to follower but does not fire the IRaftCluster.LeaderChanged event. IRaftCluster.Members will still report the local (Remote == false) node 1 as leader on node 1, but attempting to write to the log now will result in

      System.InvalidOperationException: The local cluster member is not a leader
         at DotNext.Threading.Tasks.ValueTaskCompletionSource`1.GetResult(Int16 token) in /_/src/DotNext.Threading/Threading/Tasks/ValueTaskCompletionSource.T.cs:line 272
         at DotNext.Threading.Tasks.ValueTaskCompletionSource`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) in /_/src/DotNext.Threading/Threading/Tasks/ValueTaskCompletionSource.T.cs:line 279
         at DotNext.Net.Cluster.Consensus.Raft.LeaderState.ReplicationCallback.Invoke() in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/LeaderState.Replication.cs:line 177
      --- End of stack trace from previous location ---
         at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.ReplicateAsync[TEntry](TEntry entry, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 818

IClusterMember.MemberStatusChanged will correctly fire and set node 2 to Unavailable during step 3. This behavior was not present before the upgrade to 4.2.0-beta.1.

After step 4 is done, (on node 1) IClusterMember.MemberStatusChanged will correctly fire and mark node 2 as available again. However, node 1 is still unable to write to the log as it seemingly isn't leader anymore. Only some time after this step, node 1 will correctly fire IRaftCluster.LeaderChanged to NO LEADER, and then fire it again but WITH A LEADER.