dotnet / dotNext

Next generation API for .NET
https://dotnet.github.io/dotNext/
MIT License
1.56k stars 119 forks source link

RaftNode example eventually stops making progress when saving 10 times x second #185

Closed freddyrios closed 10 months ago

freddyrios commented 11 months ago

This is about the issues being discussed here https://github.com/dotnet/dotNext/issues/184#issuecomment-1660673297.

The issue has been reproduced both in linux arm 32 bit in raspberry os (debian based) and linux debian 64 bit in wsl. At least one reproduction had all 3 nodes running, but due to the investigation of a different issue the cluster was often running with 2 out of the 3 nodes.

There are 2 observations surrounding the issue:

  1. there are often leader changes, so the term can easily reach 200 if left running for 1-3 hours
  2. eventually progress stops. When this happens there are leader change events reported.

Presumably the frequent leader changes make the issue easier to reproduce, as there is a chance it only happens during leader changes.

The issue has been reproduced in the 14.13, 14.12.3 and 4.4.1 and in all cases it has taken under a day to get to the failure. The specific settings + entries + behavior + even console output shown over ssh could play a part in making the issue more likely to reproduce, as we have systems that run for months with 4.4.1 writing entries to raft 10 times x second.

In 14.13 and 14.12.3 the issue was observed with similar output to https://github.com/dotnet/dotNext/issues/184#issuecomment-1660235577.

I only made one run in 4.4.1 where progress did stop but the output was different. While in the later versions things suddenly stop after the leader change and potentially a single TaskCanceledException (not sure if it always gets into the screen), in the 4.4.1 run there were a various TaskCanceledException that happened before the stop + some "fail" log entries with "EndOfStreamException: Attempted to read past the end of the stream" by the leader.

I share the detailed 4.4.1 in case the issue was still the same and the old version logs extra info. Of course there is a risk the old version failure was not the same type of failure and is irrelevant for the issue present in the latest versions.

Leader:

Saving value 2446000 generated by the leader node
Accepting value 2446000
Saving value 2446500 generated by the leader node
Accepting value 2446500
Saving value 2447000 generated by the leader node
Accepting value 2447000
Saving value 2447500 generated by the leader node
Accepting value 2447500
Saving value 2448000 generated by the leader node
Unexpected error System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.ReplicateAsync[TEntry](TEntry entry, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 898
   at RaftNode.DataModifier.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\fredd\source\repos\dotNext\src\Examples\RaftNode\DataModifier.cs:line 32
Consensus cannot be reached
Term of local cluster member is 50. Election timeout 00:00:00.1500000
New cluster leader is elected. Leader address is 192.168.100.8:3262
Term of local cluster member is 53. Election timeout 00:00:00.1880000
Consensus cannot be reached
Term of local cluster member is 53. Election timeout 00:00:00.1880000
New cluster leader is elected. Leader address is 192.168.100.8:3262
Term of local cluster member is 53. Election timeout 00:00:00.1880000
Accepting value 2448000
Accepting value 2448000
Accepting value 2448500
Accepting value 2449000
Building snapshot
fail: DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer[74028]
      Failed to process request from 192.168.100.8:59192
      System.IO.EndOfStreamException: Attempted to read past the end of the stream.
         at DotNext.IO.FileReader.ReadBlockAsync(Memory`1 output, CancellationToken token) in /_/src/DotNext.IO/IO/FileReader.Binary.cs:line 404
         at DotNext.IO.FileReader.<>c__DisplayClass1_0`1.<<ReadAsync>g__ReadSlowAsync|0>d.MoveNext() in /_/src/DotNext.IO/IO/FileReader.Binary.cs:line 71
      --- End of stack trace from previous location ---
         at RaftNode.SimplePersistentState.SimpleSnapshotBuilder.ApplyAsync(LogEntry entry) in C:\Users\fredd\source\repos\dotNext\src\Examples\RaftNode\SimplePersistentState.cs:line 22
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.BuildSnapshotAsync(Int32 sessionId, Int64 upperBoundIndex, SnapshotBuilder builder, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 107
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.ForceSequentialCompactionAsync(Int32 sessionId, Int64 upperBoundIndex, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 608
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.<>c__DisplayClass23_0`1.<<AppendAndCommitAsync>g__CommitAndCompactSequentiallyAsync|0>d.MoveNext() in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 380
      --- End of stack trace from previous location ---
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.AppendAndCommitAsync[TEntry](ILogEntryProducer`1 entries, Int64 startIndex, Boolean skipCommitted, Int64 commitIndex, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 429
         at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.AppendEntriesAsync[TEntry](ClusterMemberId sender, Int64 senderTerm, ILogEntryProducer`1 entries, Int64 prevLogIndex, Int64 prevLogTerm, Int64 commitIndex, IClusterConfiguration config, Boolean applyConfig, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 488
         at DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer.AppendEntriesAsync(ProtocolStream protocol, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/Tcp/TcpServer.cs:line 251
         at DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer.HandleConnection(Socket remoteClient) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/Tcp/TcpServer.cs:line 133
Consensus cannot be reached
Term of local cluster member is 53. Election timeout 00:00:00.1880000
New cluster leader is elected. Leader address is 192.168.100.1:3262
Term of local cluster member is 54. Election timeout 00:00:00.1880000
Saving value 2449500 generated by the leader node
Accepting value 2449500
Accepting value 2449500
Building snapshot
Consensus cannot be reached
Term of local cluster member is 54. Election timeout 00:00:00.1880000
Unexpected error System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.ReplicateAsync[TEntry](TEntry entry, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 898
   at RaftNode.DataModifier.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\fredd\source\repos\dotNext\src\Examples\RaftNode\DataModifier.cs:line 32
New cluster leader is elected. Leader address is 192.168.100.8:3262
Term of local cluster member is 55. Election timeout 00:00:00.1880000
Accepting value 2449500
Building snapshot
fail: DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer[74028]
      Failed to process request from 192.168.100.8:59212
      System.IO.EndOfStreamException: Attempted to read past the end of the stream.
         at DotNext.IO.FileReader.ReadBlockAsync(Memory`1 output, CancellationToken token) in /_/src/DotNext.IO/IO/FileReader.Binary.cs:line 404
         at DotNext.IO.FileReader.<>c__DisplayClass1_0`1.<<ReadAsync>g__ReadSlowAsync|0>d.MoveNext() in /_/src/DotNext.IO/IO/FileReader.Binary.cs:line 71
      --- End of stack trace from previous location ---
         at RaftNode.SimplePersistentState.SimpleSnapshotBuilder.ApplyAsync(LogEntry entry) in C:\Users\fredd\source\repos\dotNext\src\Examples\RaftNode\SimplePersistentState.cs:line 22
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.BuildSnapshotAsync(Int32 sessionId, Int64 upperBoundIndex, SnapshotBuilder builder, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 107
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.ForceSequentialCompactionAsync(Int32 sessionId, Int64 upperBoundIndex, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 608
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.<>c__DisplayClass23_0`1.<<AppendAndCommitAsync>g__CommitAndCompactSequentiallyAsync|0>d.MoveNext() in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 380
      --- End of stack trace from previous location ---
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.AppendAndCommitAsync[TEntry](ILogEntryProducer`1 entries, Int64 startIndex, Boolean skipCommitted, Int64 commitIndex, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 429
         at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.AppendEntriesAsync[TEntry](ClusterMemberId sender, Int64 senderTerm, ILogEntryProducer`1 entries, Int64 prevLogIndex, Int64 prevLogTerm, Int64 commitIndex, IClusterConfiguration config, Boolean applyConfig, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 488
         at DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer.AppendEntriesAsync(ProtocolStream protocol, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/Tcp/TcpServer.cs:line 251
         at DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer.HandleConnection(Socket remoteClient) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/Tcp/TcpServer.cs:line 133
Consensus cannot be reached
Term of local cluster member is 55. Election timeout 00:00:00.1880000
New cluster leader is elected. Leader address is 192.168.100.1:3262
Term of local cluster member is 56. Election timeout 00:00:00.1880000
Saving value 2450000 generated by the leader node
Accepting value 2450000
Accepting value 2450000
Building snapshot
Consensus cannot be reached
Term of local cluster member is 56. Election timeout 00:00:00.1880000
Unexpected error System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.ReplicateAsync[TEntry](TEntry entry, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 898
   at RaftNode.DataModifier.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\fredd\source\repos\dotNext\src\Examples\RaftNode\DataModifier.cs:line 32
New cluster leader is elected. Leader address is 192.168.100.8:3262
Term of local cluster member is 57. Election timeout 00:00:00.1880000
Accepting value 2450000
Building snapshot
fail: DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer[74028]
      Failed to process request from 192.168.100.8:59224
      System.IO.EndOfStreamException: Attempted to read past the end of the stream.
         at DotNext.IO.FileReader.ReadBlockAsync(Memory`1 output, CancellationToken token) in /_/src/DotNext.IO/IO/FileReader.Binary.cs:line 404
         at DotNext.IO.FileReader.<>c__DisplayClass1_0`1.<<ReadAsync>g__ReadSlowAsync|0>d.MoveNext() in /_/src/DotNext.IO/IO/FileReader.Binary.cs:line 71
      --- End of stack trace from previous location ---
         at RaftNode.SimplePersistentState.SimpleSnapshotBuilder.ApplyAsync(LogEntry entry) in C:\Users\fredd\source\repos\dotNext\src\Examples\RaftNode\SimplePersistentState.cs:line 22
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.BuildSnapshotAsync(Int32 sessionId, Int64 upperBoundIndex, SnapshotBuilder builder, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 107
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.ForceSequentialCompactionAsync(Int32 sessionId, Int64 upperBoundIndex, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 608
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.<>c__DisplayClass23_0`1.<<AppendAndCommitAsync>g__CommitAndCompactSequentiallyAsync|0>d.MoveNext() in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 380
      --- End of stack trace from previous location ---
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.AppendAndCommitAsync[TEntry](ILogEntryProducer`1 entries, Int64 startIndex, Boolean skipCommitted, Int64 commitIndex, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 429
         at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.AppendEntriesAsync[TEntry](ClusterMemberId sender, Int64 senderTerm, ILogEntryProducer`1 entries, Int64 prevLogIndex, Int64 prevLogTerm, Int64 commitIndex, IClusterConfiguration config, Boolean applyConfig, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 488
         at DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer.AppendEntriesAsync(ProtocolStream protocol, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/Tcp/TcpServer.cs:line 251
         at DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer.HandleConnection(Socket remoteClient) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/Tcp/TcpServer.cs:line 133
Consensus cannot be reached
Term of local cluster member is 57. Election timeout 00:00:00.1880000
New cluster leader is elected. Leader address is 192.168.100.1:3262
Term of local cluster member is 58. Election timeout 00:00:00.1880000
Saving value 2450500 generated by the leader node
Accepting value 2450500
Accepting value 2450500
Building snapshot
Consensus cannot be reached
Term of local cluster member is 58. Election timeout 00:00:00.1880000
Unexpected error System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.ReplicateAsync[TEntry](TEntry entry, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 898
   at RaftNode.DataModifier.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\fredd\source\repos\dotNext\src\Examples\RaftNode\DataModifier.cs:line 32
New cluster leader is elected. Leader address is 192.168.100.8:3262
Term of local cluster member is 59. Election timeout 00:00:00.1880000
Accepting value 2450500
Building snapshot
fail: DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer[74028]
      Failed to process request from 192.168.100.8:59236
      System.IO.EndOfStreamException: Attempted to read past the end of the stream.
         at DotNext.IO.FileReader.ReadBlockAsync(Memory`1 output, CancellationToken token) in /_/src/DotNext.IO/IO/FileReader.Binary.cs:line 404
         at DotNext.IO.FileReader.<>c__DisplayClass1_0`1.<<ReadAsync>g__ReadSlowAsync|0>d.MoveNext() in /_/src/DotNext.IO/IO/FileReader.Binary.cs:line 71
      --- End of stack trace from previous location ---
         at RaftNode.SimplePersistentState.SimpleSnapshotBuilder.ApplyAsync(LogEntry entry) in C:\Users\fredd\source\repos\dotNext\src\Examples\RaftNode\SimplePersistentState.cs:line 22
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.BuildSnapshotAsync(Int32 sessionId, Int64 upperBoundIndex, SnapshotBuilder builder, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 107
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.ForceSequentialCompactionAsync(Int32 sessionId, Int64 upperBoundIndex, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 608
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.<>c__DisplayClass23_0`1.<<AppendAndCommitAsync>g__CommitAndCompactSequentiallyAsync|0>d.MoveNext() in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 380
      --- End of stack trace from previous location ---
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.AppendAndCommitAsync[TEntry](ILogEntryProducer`1 entries, Int64 startIndex, Boolean skipCommitted, Int64 commitIndex, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 429
         at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.AppendEntriesAsync[TEntry](ClusterMemberId sender, Int64 senderTerm, ILogEntryProducer`1 entries, Int64 prevLogIndex, Int64 prevLogTerm, Int64 commitIndex, IClusterConfiguration config, Boolean applyConfig, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 488
         at DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer.AppendEntriesAsync(ProtocolStream protocol, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/Tcp/TcpServer.cs:line 251
         at DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer.HandleConnection(Socket remoteClient) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/Tcp/TcpServer.cs:line 133
Consensus cannot be reached
Term of local cluster member is 59. Election timeout 00:00:00.1880000
New cluster leader is elected. Leader address is 192.168.100.1:3262
Term of local cluster member is 60. Election timeout 00:00:00.1880000
Saving value 2451000 generated by the leader node
Accepting value 2451000
Accepting value 2451000
Building snapshot
Consensus cannot be reached
Term of local cluster member is 60. Election timeout 00:00:00.1880000
Unexpected error System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.ReplicateAsync[TEntry](TEntry entry, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 898
   at RaftNode.DataModifier.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\fredd\source\repos\dotNext\src\Examples\RaftNode\DataModifier.cs:line 32
New cluster leader is elected. Leader address is 192.168.100.8:3262
Term of local cluster member is 61. Election timeout 00:00:00.1880000
Accepting value 2451000
Building snapshot
fail: DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer[74028]
      Failed to process request from 192.168.100.8:59248
      System.IO.EndOfStreamException: Attempted to read past the end of the stream.
         at DotNext.IO.FileReader.ReadBlockAsync(Memory`1 output, CancellationToken token) in /_/src/DotNext.IO/IO/FileReader.Binary.cs:line 404
         at DotNext.IO.FileReader.<>c__DisplayClass1_0`1.<<ReadAsync>g__ReadSlowAsync|0>d.MoveNext() in /_/src/DotNext.IO/IO/FileReader.Binary.cs:line 71
      --- End of stack trace from previous location ---
         at RaftNode.SimplePersistentState.SimpleSnapshotBuilder.ApplyAsync(LogEntry entry) in C:\Users\fredd\source\repos\dotNext\src\Examples\RaftNode\SimplePersistentState.cs:line 22
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.BuildSnapshotAsync(Int32 sessionId, Int64 upperBoundIndex, SnapshotBuilder builder, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 107
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.ForceSequentialCompactionAsync(Int32 sessionId, Int64 upperBoundIndex, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 608
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.<>c__DisplayClass23_0`1.<<AppendAndCommitAsync>g__CommitAndCompactSequentiallyAsync|0>d.MoveNext() in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 380
      --- End of stack trace from previous location ---
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.AppendAndCommitAsync[TEntry](ILogEntryProducer`1 entries, Int64 startIndex, Boolean skipCommitted, Int64 commitIndex, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 429
         at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.AppendEntriesAsync[TEntry](ClusterMemberId sender, Int64 senderTerm, ILogEntryProducer`1 entries, Int64 prevLogIndex, Int64 prevLogTerm, Int64 commitIndex, IClusterConfiguration config, Boolean applyConfig, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 488
         at DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer.AppendEntriesAsync(ProtocolStream protocol, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/Tcp/TcpServer.cs:line 251
         at DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer.HandleConnection(Socket remoteClient) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/Tcp/TcpServer.cs:line 133
Consensus cannot be reached
Term of local cluster member is 61. Election timeout 00:00:00.1880000
New cluster leader is elected. Leader address is 192.168.100.1:3262
Term of local cluster member is 62. Election timeout 00:00:00.1880000
Saving value 2451500 generated by the leader node
Accepting value 2451500
Accepting value 2451500
Building snapshot
Consensus cannot be reached
Term of local cluster member is 62. Election timeout 00:00:00.1880000
Unexpected error System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.ReplicateAsync[TEntry](TEntry entry, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 898
   at RaftNode.DataModifier.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\fredd\source\repos\dotNext\src\Examples\RaftNode\DataModifier.cs:line 32
New cluster leader is elected. Leader address is 192.168.100.8:3262
Term of local cluster member is 63. Election timeout 00:00:00.1880000
Accepting value 2451500
Building snapshot
fail: DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer[74028]
      Failed to process request from 192.168.100.8:59260
      System.IO.EndOfStreamException: Attempted to read past the end of the stream.
         at DotNext.IO.FileReader.ReadBlockAsync(Memory`1 output, CancellationToken token) in /_/src/DotNext.IO/IO/FileReader.Binary.cs:line 404
         at DotNext.IO.FileReader.<>c__DisplayClass1_0`1.<<ReadAsync>g__ReadSlowAsync|0>d.MoveNext() in /_/src/DotNext.IO/IO/FileReader.Binary.cs:line 71
      --- End of stack trace from previous location ---
         at RaftNode.SimplePersistentState.SimpleSnapshotBuilder.ApplyAsync(LogEntry entry) in C:\Users\fredd\source\repos\dotNext\src\Examples\RaftNode\SimplePersistentState.cs:line 22
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.BuildSnapshotAsync(Int32 sessionId, Int64 upperBoundIndex, SnapshotBuilder builder, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 107
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.ForceSequentialCompactionAsync(Int32 sessionId, Int64 upperBoundIndex, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 608
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.<>c__DisplayClass23_0`1.<<AppendAndCommitAsync>g__CommitAndCompactSequentiallyAsync|0>d.MoveNext() in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 380
      --- End of stack trace from previous location ---
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.AppendAndCommitAsync[TEntry](ILogEntryProducer`1 entries, Int64 startIndex, Boolean skipCommitted, Int64 commitIndex, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 429
         at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.AppendEntriesAsync[TEntry](ClusterMemberId sender, Int64 senderTerm, ILogEntryProducer`1 entries, Int64 prevLogIndex, Int64 prevLogTerm, Int64 commitIndex, IClusterConfiguration config, Boolean applyConfig, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 488
         at DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer.AppendEntriesAsync(ProtocolStream protocol, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/Tcp/TcpServer.cs:line 251
         at DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer.HandleConnection(Socket remoteClient) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/Tcp/TcpServer.cs:line 133
Consensus cannot be reached
Term of local cluster member is 63. Election timeout 00:00:00.1880000
New cluster leader is elected. Leader address is 192.168.100.1:3262
Term of local cluster member is 64. Election timeout 00:00:00.1880000
Saving value 2452000 generated by the leader node
Accepting value 2452000
Accepting value 2452000
Building snapshot
Consensus cannot be reached
Term of local cluster member is 64. Election timeout 00:00:00.1880000
Unexpected error System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.ReplicateAsync[TEntry](TEntry entry, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 898
   at RaftNode.DataModifier.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\fredd\source\repos\dotNext\src\Examples\RaftNode\DataModifier.cs:line 32
New cluster leader is elected. Leader address is 192.168.100.8:3262
Term of local cluster member is 65. Election timeout 00:00:00.1880000
Accepting value 2452000
Building snapshot
fail: DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer[74028]
      Failed to process request from 192.168.100.8:59272
      System.IO.EndOfStreamException: Attempted to read past the end of the stream.
         at DotNext.IO.FileReader.ReadBlockAsync(Memory`1 output, CancellationToken token) in /_/src/DotNext.IO/IO/FileReader.Binary.cs:line 404
         at DotNext.IO.FileReader.<>c__DisplayClass1_0`1.<<ReadAsync>g__ReadSlowAsync|0>d.MoveNext() in /_/src/DotNext.IO/IO/FileReader.Binary.cs:line 71
      --- End of stack trace from previous location ---
         at RaftNode.SimplePersistentState.SimpleSnapshotBuilder.ApplyAsync(LogEntry entry) in C:\Users\fredd\source\repos\dotNext\src\Examples\RaftNode\SimplePersistentState.cs:line 22
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.BuildSnapshotAsync(Int32 sessionId, Int64 upperBoundIndex, SnapshotBuilder builder, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 107
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.ForceSequentialCompactionAsync(Int32 sessionId, Int64 upperBoundIndex, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 608
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.<>c__DisplayClass23_0`1.<<AppendAndCommitAsync>g__CommitAndCompactSequentiallyAsync|0>d.MoveNext() in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 380
      --- End of stack trace from previous location ---
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.AppendAndCommitAsync[TEntry](ILogEntryProducer`1 entries, Int64 startIndex, Boolean skipCommitted, Int64 commitIndex, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 429
         at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.AppendEntriesAsync[TEntry](ClusterMemberId sender, Int64 senderTerm, ILogEntryProducer`1 entries, Int64 prevLogIndex, Int64 prevLogTerm, Int64 commitIndex, IClusterConfiguration config, Boolean applyConfig, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 488
         at DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer.AppendEntriesAsync(ProtocolStream protocol, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/Tcp/TcpServer.cs:line 251
         at DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer.HandleConnection(Socket remoteClient) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/Tcp/TcpServer.cs:line 133
Consensus cannot be reached
Term of local cluster member is 65. Election timeout 00:00:00.1880000
New cluster leader is elected. Leader address is 192.168.100.1:3262
Term of local cluster member is 66. Election timeout 00:00:00.1880000
Saving value 2452500 generated by the leader node
Accepting value 2452500
Accepting value 2452500
Building snapshot
Consensus cannot be reached
Term of local cluster member is 66. Election timeout 00:00:00.1880000
Unexpected error System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.ReplicateAsync[TEntry](TEntry entry, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 898
   at RaftNode.DataModifier.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\fredd\source\repos\dotNext\src\Examples\RaftNode\DataModifier.cs:line 32
New cluster leader is elected. Leader address is 192.168.100.8:3262
Term of local cluster member is 67. Election timeout 00:00:00.1880000
Accepting value 2452500
Building snapshot
fail: DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer[74028]
      Failed to process request from 192.168.100.8:59284
      System.IO.EndOfStreamException: Attempted to read past the end of the stream.
         at DotNext.IO.FileReader.ReadBlockAsync(Memory`1 output, CancellationToken token) in /_/src/DotNext.IO/IO/FileReader.Binary.cs:line 404
         at DotNext.IO.FileReader.<>c__DisplayClass1_0`1.<<ReadAsync>g__ReadSlowAsync|0>d.MoveNext() in /_/src/DotNext.IO/IO/FileReader.Binary.cs:line 71
      --- End of stack trace from previous location ---
         at RaftNode.SimplePersistentState.SimpleSnapshotBuilder.ApplyAsync(LogEntry entry) in C:\Users\fredd\source\repos\dotNext\src\Examples\RaftNode\SimplePersistentState.cs:line 22
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.BuildSnapshotAsync(Int32 sessionId, Int64 upperBoundIndex, SnapshotBuilder builder, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 107
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.ForceSequentialCompactionAsync(Int32 sessionId, Int64 upperBoundIndex, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 608
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.<>c__DisplayClass23_0`1.<<AppendAndCommitAsync>g__CommitAndCompactSequentiallyAsync|0>d.MoveNext() in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 380
      --- End of stack trace from previous location ---
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.AppendAndCommitAsync[TEntry](ILogEntryProducer`1 entries, Int64 startIndex, Boolean skipCommitted, Int64 commitIndex, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 429
         at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.AppendEntriesAsync[TEntry](ClusterMemberId sender, Int64 senderTerm, ILogEntryProducer`1 entries, Int64 prevLogIndex, Int64 prevLogTerm, Int64 commitIndex, IClusterConfiguration config, Boolean applyConfig, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 488
         at DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer.AppendEntriesAsync(ProtocolStream protocol, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/Tcp/TcpServer.cs:line 251
         at DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer.HandleConnection(Socket remoteClient) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/Tcp/TcpServer.cs:line 133
Consensus cannot be reached
Term of local cluster member is 67. Election timeout 00:00:00.1880000
New cluster leader is elected. Leader address is 192.168.100.1:3262
Term of local cluster member is 68. Election timeout 00:00:00.1880000
Saving value 2453000 generated by the leader node
Accepting value 2453000
Accepting value 2453000
Building snapshot
Consensus cannot be reached
Term of local cluster member is 68. Election timeout 00:00:00.1880000
Unexpected error System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.ReplicateAsync[TEntry](TEntry entry, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 898
   at RaftNode.DataModifier.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\fredd\source\repos\dotNext\src\Examples\RaftNode\DataModifier.cs:line 32
New cluster leader is elected. Leader address is 192.168.100.8:3262
Term of local cluster member is 69. Election timeout 00:00:00.1880000
Accepting value 2453000
Building snapshot
fail: DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer[74028]
      Failed to process request from 192.168.100.8:59296
      System.IO.EndOfStreamException: Attempted to read past the end of the stream.
         at DotNext.IO.FileReader.ReadBlockAsync(Memory`1 output, CancellationToken token) in /_/src/DotNext.IO/IO/FileReader.Binary.cs:line 404
         at DotNext.IO.FileReader.<>c__DisplayClass1_0`1.<<ReadAsync>g__ReadSlowAsync|0>d.MoveNext() in /_/src/DotNext.IO/IO/FileReader.Binary.cs:line 71
      --- End of stack trace from previous location ---
         at RaftNode.SimplePersistentState.SimpleSnapshotBuilder.ApplyAsync(LogEntry entry) in C:\Users\fredd\source\repos\dotNext\src\Examples\RaftNode\SimplePersistentState.cs:line 22
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.BuildSnapshotAsync(Int32 sessionId, Int64 upperBoundIndex, SnapshotBuilder builder, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 107
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.ForceSequentialCompactionAsync(Int32 sessionId, Int64 upperBoundIndex, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 608
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.<>c__DisplayClass23_0`1.<<AppendAndCommitAsync>g__CommitAndCompactSequentiallyAsync|0>d.MoveNext() in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 380
      --- End of stack trace from previous location ---
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.AppendAndCommitAsync[TEntry](ILogEntryProducer`1 entries, Int64 startIndex, Boolean skipCommitted, Int64 commitIndex, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 429
         at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.AppendEntriesAsync[TEntry](ClusterMemberId sender, Int64 senderTerm, ILogEntryProducer`1 entries, Int64 prevLogIndex, Int64 prevLogTerm, Int64 commitIndex, IClusterConfiguration config, Boolean applyConfig, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 488
         at DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer.AppendEntriesAsync(ProtocolStream protocol, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/Tcp/TcpServer.cs:line 251
         at DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer.HandleConnection(Socket remoteClient) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/Tcp/TcpServer.cs:line 133
Consensus cannot be reached
Term of local cluster member is 69. Election timeout 00:00:00.1880000
New cluster leader is elected. Leader address is 192.168.100.1:3262
Term of local cluster member is 70. Election timeout 00:00:00.1880000
Saving value 2453500 generated by the leader node
Accepting value 2453500
Accepting value 2453500
Building snapshot
Consensus cannot be reached
Term of local cluster member is 70. Election timeout 00:00:00.1880000
Unexpected error System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.ReplicateAsync[TEntry](TEntry entry, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 898
   at RaftNode.DataModifier.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\fredd\source\repos\dotNext\src\Examples\RaftNode\DataModifier.cs:line 32
New cluster leader is elected. Leader address is 192.168.100.8:3262
Term of local cluster member is 71. Election timeout 00:00:00.1880000
Accepting value 2453500
Building snapshot
fail: DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer[74028]
      Failed to process request from 192.168.100.8:59308
      System.IO.EndOfStreamException: Attempted to read past the end of the stream.
         at DotNext.IO.FileReader.ReadBlockAsync(Memory`1 output, CancellationToken token) in /_/src/DotNext.IO/IO/FileReader.Binary.cs:line 404
         at DotNext.IO.FileReader.<>c__DisplayClass1_0`1.<<ReadAsync>g__ReadSlowAsync|0>d.MoveNext() in /_/src/DotNext.IO/IO/FileReader.Binary.cs:line 71
      --- End of stack trace from previous location ---
         at RaftNode.SimplePersistentState.SimpleSnapshotBuilder.ApplyAsync(LogEntry entry) in C:\Users\fredd\source\repos\dotNext\src\Examples\RaftNode\SimplePersistentState.cs:line 22
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.BuildSnapshotAsync(Int32 sessionId, Int64 upperBoundIndex, SnapshotBuilder builder, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 107
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.ForceSequentialCompactionAsync(Int32 sessionId, Int64 upperBoundIndex, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 608
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.<>c__DisplayClass23_0`1.<<AppendAndCommitAsync>g__CommitAndCompactSequentiallyAsync|0>d.MoveNext() in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 380
      --- End of stack trace from previous location ---
         at DotNext.Net.Cluster.Consensus.Raft.MemoryBasedStateMachine.AppendAndCommitAsync[TEntry](ILogEntryProducer`1 entries, Int64 startIndex, Boolean skipCommitted, Int64 commitIndex, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/MemoryBasedStateMachine.cs:line 429
         at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.AppendEntriesAsync[TEntry](ClusterMemberId sender, Int64 senderTerm, ILogEntryProducer`1 entries, Int64 prevLogIndex, Int64 prevLogTerm, Int64 commitIndex, IClusterConfiguration config, Boolean applyConfig, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 488
         at DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer.AppendEntriesAsync(ProtocolStream protocol, CancellationToken token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/Tcp/TcpServer.cs:line 251
         at DotNext.Net.Cluster.Consensus.Raft.Tcp.TcpServer.HandleConnection(Socket remoteClient) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/Tcp/TcpServer.cs:line 133
Consensus cannot be reached
Term of local cluster member is 71. Election timeout 00:00:00.1880000
New cluster leader is elected. Leader address is 192.168.100.1:3262
Term of local cluster member is 72. Election timeout 00:00:00.1880000
Accepting value 2454000
Building snapshot
Saving value 2454500 generated by the leader node

Follower

Accepting value 2446000                                                                           
Accepting value 2446500
Accepting value 2447000
Accepting value 2447500
Consensus cannot be reached
Term of local cluster member is 50. Election timeout 00:00:00.2550000
New cluster leader is elected. Leader address is 192.168.100.8:3262
Term of local cluster member is 53. Election timeout 00:00:00.2550000
Saving value 2448000 generated by the leader node
Accepting value 2448000
Accepting value 2448000
Saving value 2448500 generated by the leader node
Accepting value 2448500
Saving value 2449000 generated by the leader node
Accepting value 2449000
Building snapshot
Saving value 2449500 generated by the leader node
Unexpected error System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.ReplicateAsync[TEntry](TEntry entry, CancellationToke
n token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 898
   at RaftNode.DataModifier.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\fredd\source\repos\dot
Next\src\Examples\RaftNode\DataModifier.cs:line 32
Consensus cannot be reached
Term of local cluster member is 53. Election timeout 00:00:00.2550000
New cluster leader is elected. Leader address is 192.168.100.1:3262                               
Term of local cluster member is 54. Election timeout 00:00:00.2550000
Consensus cannot be reached
Term of local cluster member is 54. Election timeout 00:00:00.2550000
New cluster leader is elected. Leader address is 192.168.100.8:3262
Term of local cluster member is 55. Election timeout 00:00:00.2550000
Saving value 2449500 generated by the leader node
Accepting value 2449500
Accepting value 2449500
Accepting value 2449500
Saving value 2450000 generated by the leader node
Unexpected error System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.ReplicateAsync[TEntry](TEntry entry, CancellationToke
n token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 898
   at RaftNode.DataModifier.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\fredd\source\repos\dot
Next\src\Examples\RaftNode\DataModifier.cs:line 32
Consensus cannot be reached
Term of local cluster member is 55. Election timeout 00:00:00.2550000
New cluster leader is elected. Leader address is 192.168.100.1:3262
Term of local cluster member is 56. Election timeout 00:00:00.2550000
Consensus cannot be reached
Term of local cluster member is 56. Election timeout 00:00:00.2550000
New cluster leader is elected. Leader address is 192.168.100.8:3262
Term of local cluster member is 57. Election timeout 00:00:00.2550000
Saving value 2450000 generated by the leader node                                                 
Accepting value 2450000
Accepting value 2450000
Accepting value 2450000
Saving value 2450500 generated by the leader node
Unexpected error System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.ReplicateAsync[TEntry](TEntry entry, CancellationToke
n token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 898
   at RaftNode.DataModifier.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\fredd\source\repos\dot
Next\src\Examples\RaftNode\DataModifier.cs:line 32
Consensus cannot be reached
Term of local cluster member is 57. Election timeout 00:00:00.2550000
New cluster leader is elected. Leader address is 192.168.100.1:3262
Term of local cluster member is 58. Election timeout 00:00:00.2550000
Consensus cannot be reached
Term of local cluster member is 58. Election timeout 00:00:00.2550000
New cluster leader is elected. Leader address is 192.168.100.8:3262
Term of local cluster member is 59. Election timeout 00:00:00.2550000
Saving value 2450500 generated by the leader node
Accepting value 2450500
Accepting value 2450500
Accepting value 2450500
Saving value 2451000 generated by the leader node
Unexpected error System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.ReplicateAsync[TEntry](TEntry entry, Cancell[95/1924]
n token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 898
   at RaftNode.DataModifier.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\fredd\source\repos\dot
Next\src\Examples\RaftNode\DataModifier.cs:line 32
Consensus cannot be reached
Term of local cluster member is 59. Election timeout 00:00:00.2550000
New cluster leader is elected. Leader address is 192.168.100.1:3262
Term of local cluster member is 60. Election timeout 00:00:00.2550000
Consensus cannot be reached
Term of local cluster member is 60. Election timeout 00:00:00.2550000
New cluster leader is elected. Leader address is 192.168.100.8:3262
Term of local cluster member is 61. Election timeout 00:00:00.2550000
Saving value 2451000 generated by the leader node
Accepting value 2451000
Accepting value 2451000
Accepting value 2451000
Saving value 2451500 generated by the leader node
Unexpected error System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.ReplicateAsync[TEntry](TEntry entry, CancellationToke
n token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 898
   at RaftNode.DataModifier.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\fredd\source\repos\dot
Next\src\Examples\RaftNode\DataModifier.cs:line 32
Consensus cannot be reached
Term of local cluster member is 61. Election timeout 00:00:00.2550000
New cluster leader is elected. Leader address is 192.168.100.1:3262                                
Term of local cluster member is 62. Election timeout 00:00:00.2550000
Consensus cannot be reached
Term of local cluster member is 62. Election timeout 00:00:00.2550000
New cluster leader is elected. Leader address is 192.168.100.8:3262
Term of local cluster member is 63. Election timeout 00:00:00.2550000
Saving value 2451500 generated by the leader node
Accepting value 2451500
Accepting value 2451500
Accepting value 2451500
Saving value 2452000 generated by the leader node
Unexpected error System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.ReplicateAsync[TEntry](TEntry entry, CancellationToke
n token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 898
   at RaftNode.DataModifier.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\fredd\source\repos\dot
Next\src\Examples\RaftNode\DataModifier.cs:line 32
Consensus cannot be reached
Term of local cluster member is 63. Election timeout 00:00:00.2550000
New cluster leader is elected. Leader address is 192.168.100.1:3262
Term of local cluster member is 64. Election timeout 00:00:00.2550000
Consensus cannot be reached
Term of local cluster member is 64. Election timeout 00:00:00.2550000
New cluster leader is elected. Leader address is 192.168.100.8:3262
Term of local cluster member is 65. Election timeout 00:00:00.2550000
Saving value 2452000 generated by the leader node                                                  
Accepting value 2452000
Accepting value 2452000
Accepting value 2452000
Saving value 2452500 generated by the leader node
Unexpected error System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.ReplicateAsync[TEntry](TEntry entry, CancellationToke
n token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 898
   at RaftNode.DataModifier.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\fredd\source\repos\dot
Next\src\Examples\RaftNode\DataModifier.cs:line 32
Consensus cannot be reached
Term of local cluster member is 65. Election timeout 00:00:00.2550000
New cluster leader is elected. Leader address is 192.168.100.1:3262
Term of local cluster member is 66. Election timeout 00:00:00.2550000
Consensus cannot be reached
Term of local cluster member is 66. Election timeout 00:00:00.2550000
New cluster leader is elected. Leader address is 192.168.100.8:3262
Term of local cluster member is 67. Election timeout 00:00:00.2550000
Saving value 2452500 generated by the leader node
Accepting value 2452500
Accepting value 2452500
Accepting value 2452500
Saving value 2453000 generated by the leader node
Unexpected error System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.ReplicateAsync[TEntry](TEntry entry, Cancell[23/1924]
n token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 898
   at RaftNode.DataModifier.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\fredd\source\repos\dot
Next\src\Examples\RaftNode\DataModifier.cs:line 32
Consensus cannot be reached
Term of local cluster member is 67. Election timeout 00:00:00.2550000
New cluster leader is elected. Leader address is 192.168.100.1:3262
Term of local cluster member is 68. Election timeout 00:00:00.2550000
Consensus cannot be reached
Term of local cluster member is 68. Election timeout 00:00:00.2550000
New cluster leader is elected. Leader address is 192.168.100.8:3262
Term of local cluster member is 69. Election timeout 00:00:00.2550000
Saving value 2453000 generated by the leader node
Accepting value 2453000
Accepting value 2453000
Accepting value 2453000
Saving value 2453500 generated by the leader node
Unexpected error System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.ReplicateAsync[TEntry](TEntry entry, CancellationToke
n token) in /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 898
   at RaftNode.DataModifier.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\fredd\source\repos\dot
Next\src\Examples\RaftNode\DataModifier.cs:line 32
Consensus cannot be reached
Term of local cluster member is 69. Election timeout 00:00:00.2550000
New cluster leader is elected. Leader address is 192.168.100.1:3262
Term of local cluster member is 70. Election timeout 00:00:00.2550000
Consensus cannot be reached
Term of local cluster member is 70. Election timeout 00:00:00.2550000
New cluster leader is elected. Leader address is 192.168.100.8:3262
Term of local cluster member is 71. Election timeout 00:00:00.2550000
Saving value 2453500 generated by the leader node
Accepting value 2453500
Accepting value 2453500
Accepting value 2453500
Saving value 2454000 generated by the leader node
Unexpected error System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at DotNext.Net.Cluster.Consensus.Raft.RaftCluster`1.ReplicateAsync[TEntry](TEntry entry, CancellationToken token) in
 /_/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/RaftCluster.cs:line 898
   at RaftNode.DataModifier.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\fredd\source\repos\dotNext\src\Ex
amples\RaftNode\DataModifier.cs:line 32
Consensus cannot be reached
Term of local cluster member is 71. Election timeout 00:00:00.2550000
New cluster leader is elected. Leader address is 192.168.100.1:3262
Term of local cluster member is 72. Election timeout 00:00:00.2550000
Consensus cannot be reached
Term of local cluster member is 72. Election timeout 00:00:00.2550000
sakno commented 11 months ago

Currently, the only reason for OperationCanceledException is a token cancellation. According to sources of DataModifier, we have:

var source = CancellationTokenSource.CreateLinkedTokenSource(stoppingToken, leadershipToken);

stoppingToken is not a reason I think, because the token represents the lifetime of the hosted service. Inside of ReplicateAsync implementation, passed token is linked with another token associated with RaftCluster lifetime. Moreover, the node doesn't print Saving value {0} generated by the leader node. According to sources, this is possible only if the node is no longer a leader.

freddyrios commented 11 months ago

Ok, looking at the 4.4.1 output again I can see that all this failure repetition is actuall leader changes, as you can see it reporting a different leader ip. Since the leadership token would be signaled that explains the canceled exceptions.

Is there a chance the difference in logging output of 4.4.1 vs later versions to be related to the fixes for #168? After all, those changes can cause less intermediate leader changes from being reported.

I find it likely that those previous leader changes are just the usual behavior and only the last change when it gets stuck is some different case being hit which is the bit that might be the same in all versions.

freddyrios commented 11 months ago

If we focus instead on the last outputs:

So not exactly the same last logged actions. Should I do a capture on the latest version with more detailed logging, maybe that could clarify what goes on when it stops?

sakno commented 11 months ago

Is there a chance the difference in logging output of 4.4.1 vs later versions

It was minor change for ForceReplicationAsync method, Task replaced by ValueTask to remove some memory allocations on the hot path (ReplicateAsync is called very frequently for every change). So logging itself is not changed, but stacktrace.

sakno commented 11 months ago

Should I do a capture on the latest version with more detailed logging

Yes please. It's good know what happening with Raft internals exactly at this moment.

freddyrios commented 11 months ago

Attached the log files. Its easy to spot when it stops making progress, as it is the last time Accepting/Saving messages show in all the logs.

raftnodetest.zip

In short, after the last leader election the followers (where the last "Member is downgrading" until "Incoming configuration with" keeps repeating):

2023-08-03 11:12:54.7453|INFO|RaftNode.ClusterConfigurator|New cluster leader is elected. Leader address is 192.168.100.10:3262
2023-08-03 11:12:54.7453|INFO|RaftNode.ClusterConfigurator|Term of local cluster member is 82. Election timeout 00:00:00.2470000
2023-08-03 11:12:54.8398|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Member is downgrading to follower state with term 82
2023-08-03 11:12:54.8398|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Election timeout is refreshed
2023-08-03 11:12:54.8398|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Member is downgraded to follower state with term 82
2023-08-03 11:12:54.8612|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Incoming configuration with 484292758073552492 fingerprint from leader. Local fingerprint is 484292758073552492, apply False
2023-08-03 11:12:54.8686|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Member is downgrading to follower state with term 82
2023-08-03 11:12:54.8686|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Election timeout is refreshed
2023-08-03 11:12:54.8686|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Member is downgraded to follower state with term 82
2023-08-03 11:12:54.8803|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Incoming configuration with 484292758073552492 fingerprint from leader. Local fingerprint is 484292758073552492, apply True
2023-08-03 11:12:54.9611|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Member is downgrading to follower state with term 82
2023-08-03 11:12:54.9611|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Election timeout is refreshed
2023-08-03 11:12:54.9611|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Member is downgraded to follower state with term 82
2023-08-03 11:12:54.9611|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Incoming configuration with 484292758073552492 fingerprint from leader. Local fingerprint is 484292758073552492, apply True

while the leader (where the last "Replication of member ... started" until "Local changes are not committed" keeps repeating):

2023-08-03 11:12:54.7414|INFO|RaftNode.ClusterConfigurator|New cluster leader is elected. Leader address is 192.168.100.10:3262
2023-08-03 11:12:54.7414|INFO|RaftNode.ClusterConfigurator|Term of local cluster member is 82. Election timeout 00:00:00.1870000
2023-08-03 11:12:54.7414|INFO|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Transition to Leader state has completed with term 82
2023-08-03 11:12:54.7414|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Replication of member 192.168.100.1:3262 started at log index 12075
2023-08-03 11:12:54.7414|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Replication of 192.168.100.1:3262 contains 0 entries. Preceding entry has index 12074 and term 82. Local fingerprint is 0, remote is 484292758073552492, apply False
2023-08-03 11:12:54.7414|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Replication of member 192.168.100.8:3262 started at log index 12075
2023-08-03 11:12:54.7414|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Replication of 192.168.100.8:3262 contains 0 entries. Preceding entry has index 12074 and term 82. Local fingerprint is 0, remote is 484292758073552492, apply False
2023-08-03 11:12:54.7414|WARN|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Replication of 192.168.100.8:3262 is failed. Retry replication from entry 12074
2023-08-03 11:12:54.7414|WARN|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Replication of 192.168.100.1:3262 is failed. Retry replication from entry 12074
2023-08-03 11:12:54.7414|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Local changes are not committed. Quorum is 3, last committed entry is 12072
2023-08-03 11:12:54.8366|INFO|RaftNode.DataModifier|Saving value 5996500 generated by the leader node
2023-08-03 11:12:54.8366|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Replication of member 192.168.100.8:3262 started at log index 12074
2023-08-03 11:12:54.8366|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Replication of member 192.168.100.1:3262 started at log index 12074
2023-08-03 11:12:54.8366|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Replication of 192.168.100.1:3262 contains 2 entries. Preceding entry has index 12073 and term 80. Local fingerprint is 0, remote is 484292758073552492, apply False
2023-08-03 11:12:54.8366|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Replication of 192.168.100.8:3262 contains 2 entries. Preceding entry has index 12073 and term 80. Local fingerprint is 0, remote is 484292758073552492, apply False
2023-08-03 11:12:54.8461|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Member 192.168.100.8:3262 is replicated successfully starting from index 12074
2023-08-03 11:12:54.8613|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Member 192.168.100.1:3262 is replicated successfully starting from index 12074
2023-08-03 11:12:54.8613|INFO|RaftNode.SimplePersistentState|Accepting value 5996500
2023-08-03 11:12:54.8613|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|All changes at index 12073 are committed. The number of committed entries is 2
2023-08-03 11:12:54.8613|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Replication of member 192.168.100.8:3262 started at log index 12076
2023-08-03 11:12:54.8613|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Replication of 192.168.100.8:3262 contains 0 entries. Preceding entry has index 12075 and term 82. Local fingerprint is 484292758073552492, remote is 484292758073552492, apply True
2023-08-03 11:12:54.8613|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Replication of member 192.168.100.1:3262 started at log index 12076
2023-08-03 11:12:54.8613|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Replication of 192.168.100.1:3262 contains 0 entries. Preceding entry has index 12075 and term 82. Local fingerprint is 484292758073552492, remote is 484292758073552492, apply True
2023-08-03 11:12:54.8613|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Member 192.168.100.8:3262 is replicated successfully starting from index 12076
2023-08-03 11:12:54.8804|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Member 192.168.100.1:3262 is replicated successfully starting from index 12076
2023-08-03 11:12:54.8804|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Local changes are not committed. Quorum is 3, last committed entry is 12074
2023-08-03 11:12:54.9590|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Replication of member 192.168.100.1:3262 started at log index 12076
2023-08-03 11:12:54.9590|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Replication of member 192.168.100.8:3262 started at log index 12076
2023-08-03 11:12:54.9590|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Replication of 192.168.100.1:3262 contains 0 entries. Preceding entry has index 12075 and term 82. Local fingerprint is 484292758073552492, remote is 484292758073552492, apply True
2023-08-03 11:12:54.9590|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Replication of 192.168.100.8:3262 contains 0 entries. Preceding entry has index 12075 and term 82. Local fingerprint is 484292758073552492, remote is 484292758073552492, apply True
2023-08-03 11:12:54.9624|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Member 192.168.100.8:3262 is replicated successfully starting from index 12076
2023-08-03 11:12:54.9624|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Member 192.168.100.1:3262 is replicated successfully starting from index 12076
2023-08-03 11:12:54.9624|DEBUG|DotNext.Net.Cluster.Consensus.Raft.RaftCluster|Local changes are not committed. Quorum is 3, last committed entry is 12074
sakno commented 10 months ago

The root cause is a concurrency between a thread running replication method and DataModifier threading appending new entry. Now it is fixed (I hope so). Also, I removed some memory allocations where possible for acceptable price (from code readability point of view).

A few thoughts about memory growth: heartbeat loop uses SustainedLowLatency GC latency mode during heartbeat, but not between them: https://github.com/dotnet/dotNext/blob/e29b972ecddd2610a45ecd402f78755c79adc977/src/cluster/DotNext.Net.Cluster/Net/Cluster/Consensus/Raft/LeaderState.cs#L218-L224

It is necessary to prevent GC intrusion during heartbeat to avoid false positive timeouts. Between heartbeats GC is free to reclaim unused object. The implementation automatically reduced heartbeat period if GC happened in between.

freddyrios commented 10 months ago

I noticed you added it to the latest release, but I am doing the run with the latest in the develop branch given there was another commit that mentioned a concurrencyfix (an hour and a half so far).

I will let it run it for at least 1 day and report back.

sakno commented 10 months ago

there was another commit that mentioned a concurrencyfix

It was not related to the current issue. When commit relates directly to issue, I prefer to add issue number to the commit message.

freddyrios commented 10 months ago

looks good, the run is still going strong well past the point of all the previous failures running the example in that environment.

freddyrios commented 10 months ago

reopening as by mistake I got confused and thought the fix was already in release

sakno commented 10 months ago

I'll publish a new release this week.