Joining then leaving a cluster triggers infinite candidate loop

cdavernas commented 2 years ago

What happens?

Given a cluster made out of one coldStart node, Given that the cluster is afterwards dynamically joined by another node, Given that a node leaves the cluster, leaving the other alone. Then the remaining node seems stuck in a candidate loop

What is expected?

The remaining node elects itself as the leader

Additional info

Logs (the section repeats itself until app shutdown):

info: DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster[74002]
      Transition to Candidate state started
warn: DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster[75001]
      Cluster member http://localhost:5126/ is unavailable
      System.TimeoutException: The operation was canceled.
       ---> System.Threading.Tasks.TaskCanceledException: The operation was canceled.
       ---> System.TimeoutException: A connection could not be established within the configured ConnectTimeout.
         --- End of inner exception stack trace ---
         at System.Net.Http.HttpConnectionPool.CreateConnectTimeoutException(OperationCanceledException oce)
         at System.Net.Http.HttpConnectionPool.AddHttp11ConnectionAsync(HttpRequestMessage request)
         at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s)
         at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
         at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread)
         at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext()
         at System.Runtime.CompilerServices.TaskAwaiter.<>c.<OutputWaitEtwEvents>b__12_0(Action innerContinuation, Task innerTask)
         at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action action, Boolean allowInlining)
         at System.Threading.Tasks.Task.RunContinuations(Object continuationObject)
         at System.Threading.Tasks.Task.CancellationCleanupLogic()
         at System.Threading.Tasks.Task.TrySetCanceled(CancellationToken tokenToRecord, Object cancellationException)
         at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetException(Exception exception, Task`1& taskField)
         at System.Runtime.CompilerServices.AsyncValueTaskMethodBuilder`1.SetException(Exception exception)
         at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
         at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s)
         at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
         at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread)
         at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext()
         at System.Runtime.CompilerServices.TaskAwaiter.<>c.<OutputWaitEtwEvents>b__12_0(Action innerContinuation, Task innerTask)
         at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action action, Boolean allowInlining)
         at System.Threading.Tasks.Task.RunContinuations(Object continuationObject)
         at System.Threading.Tasks.Task.CancellationCleanupLogic()
         at System.Threading.Tasks.Task.TrySetCanceled(CancellationToken tokenToRecord, Object cancellationException)
         at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetException(Exception exception, Task`1& taskField)
         at System.Runtime.CompilerServices.AsyncValueTaskMethodBuilder`1.SetException(Exception exception)
         at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
         at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s)
         at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
         at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread)
         at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext()
         at System.Runtime.CompilerServices.TaskAwaiter.<>c.<OutputWaitEtwEvents>b__12_0(Action innerContinuation, Task innerTask)
         at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action action, Boolean allowInlining)
         at System.Threading.Tasks.Task.RunContinuations(Object continuationObject)
         at System.Threading.Tasks.Task.CancellationCleanupLogic()
         at System.Threading.Tasks.Task.TrySetCanceled(CancellationToken tokenToRecord, Object cancellationException)
         at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetException(Exception exception, Task`1& taskField)
         at System.Runtime.CompilerServices.AsyncValueTaskMethodBuilder`1.SetException(Exception exception)
         at System.Net.Http.HttpConnectionPool.ConnectToTcpHostAsync(String host, Int32 port, HttpRequestMessage initialRequest, Boolean async, CancellationToken cancellationToken)
         at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s)
         at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
         at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread)
         at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext()
         at System.Runtime.CompilerServices.TaskAwaiter.<>c.<OutputWaitEtwEvents>b__12_0(Action innerContinuation, Task innerTask)
         at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(Action action, Boolean allowInlining)
         at System.Threading.Tasks.Task.RunContinuations(Object continuationObject)
         at System.Threading.Tasks.Task.CancellationCleanupLogic()
         at System.Threading.Tasks.Task.TrySetCanceled(CancellationToken tokenToRecord, Object cancellationException)
         at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetException(Exception exception, Task`1& taskField)
         at System.Runtime.CompilerServices.AsyncValueTaskMethodBuilder.SetException(Exception exception)
         at System.Net.Sockets.Socket.<ConnectAsync>g__WaitForConnectWithCancellation|277_0(AwaitableSocketAsyncEventArgs saea, ValueTask connectTask, CancellationToken cancellationToken)
         at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s)
         at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
         at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread)
         at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext()
         at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.InvokeContinuation(Action`1 continuation, Object state, Boolean forceAsync, Boolean requiresExecutionContextFlow)
         at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.OnCompleted(SocketAsyncEventArgs _)
         at System.Net.Sockets.SocketAsyncEventArgs.<DnsConnectAsync>g__Core|112_0(MultiConnectSocketAsyncEventArgs internalArgs, Task`1 addressesTask, Int32 port, SocketType socketType, ProtocolType protocolType, CancellationToken cancellationToken)
         at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s)
         at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
         at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread)
         at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext()
         at System.Threading.Tasks.Sources.ManualResetValueTaskSourceCore`1.SetResult(TResult result)
         at System.Net.Sockets.SocketAsyncEventArgs.MultiConnectSocketAsyncEventArgs.OnCompleted(SocketAsyncEventArgs e)
         at System.Net.Sockets.SocketAsyncEventArgs.ExecutionCallback(Object state)
         at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
         at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
         at System.Net.Sockets.SocketAsyncEventArgs.HandleCompletionPortCallbackError(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* nativeOverlapped)
         at System.Threading._IOCompletionCallback.PerformIOCompletionCallback(UInt32 errorCode, UInt32 numBytes, NativeOverlapped* pNativeOverlapped)
      --- End of stack trace from previous location ---
         at System.Threading.Tasks.TaskCompletionSourceWithCancellation`1.WaitWithCancellationAsync(CancellationToken cancellationToken)
         at System.Net.Http.HttpConnectionPool.GetHttp11ConnectionAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
         at System.Net.Http.HttpConnectionPool.SendWithVersionDetectionAndRetryAsync(HttpRequestMessage request, Boolean async, Boolean doRequestAuth, CancellationToken cancellationToken)
         at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken)
         at System.Net.Http.HttpClient.<SendAsync>g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken)
         --- End of inner exception stack trace ---

sakno commented 2 years ago

@cdavernas , that is the correct behavior. The node cannot recognize itself.

Given a cluster made out of one coldStart node

That is wrong. If configuration is empty, the first node in the cluster must be booted with coldStart=true. Other nodes must be promoted via IRaftHttpCluster.AddMemberAsync method. Here is the documentation for node bootstrapping

cdavernas commented 2 years ago

@cdavernas , that is the correct behavior. The node cannot recognize itself.

Hmm, allright, that does make sense.

However, how would you proceed, then, to combine both Raft and HyParView to achieve something like the following:

  public class PeerLifetime
      : IPeerLifetime
  {

      public PeerLifetime(ILogger<PeerLifetime> logger, IRaftHttpCluster cluster)
      {
          this.Logger = logger;
          this.Cluster = cluster;
      }

      protected ILogger Logger { get; }

      protected IRaftHttpCluster Cluster { get; }

      public virtual void OnStart(PeerController controller)
      {
          controller.PeerDiscovered += this.OnPeerDiscoveredAsync;
          controller.PeerGone += this.OnPeerGoneAsync;
      }

      public virtual void OnStop(PeerController controller)
      {
          controller.PeerDiscovered -= this.OnPeerDiscoveredAsync;
          controller.PeerGone -= this.OnPeerGoneAsync;
      }

      protected virtual async void OnPeerDiscoveredAsync(PeerController controller, PeerEventArgs args)
      {
          try
          {
              await this.Cluster.AddMemberAsync(ClusterMemberId.FromEndPoint(args.PeerAddress), (HttpEndPoint)args.PeerAddress);
          }
          catch (Exception ex)
          {

          }
          Console.WriteLine($"Peer {args.PeerAddress} has been discovered by the current node");
      }

      protected virtual async void OnPeerGoneAsync(PeerController controller, PeerEventArgs args)
      {
          try
          {
              await this.Cluster.RemoveMemberAsync((HttpEndPoint)args.PeerAddress);
          }
          catch (Exception ex)
          {

          }
          Console.WriteLine($"Peer {args.PeerAddress} is no longer visible by the current node");
      }

  }

Is achieving something similar even possible?

That is wrong. If configuration is empty, the first node in the cluster must be booted with coldStart=true

Yeah, that's actually what I meant I'm doing,

sakno commented 2 years ago

You don't need HyParView for member discovery. Raft has built-in mechanism for that through member announcement. The application must implement announcement mechanism (HTTP endpoint, for example) and invoke IRaftHttpCluster.AddMemberAsync on leader node. Membership change can be performed by the leader node only.

sakno commented 2 years ago

Related #108 (if you're using persistent configuration).

davhdavh commented 1 year ago

I am seeing a similar problem in the https://github.com/davhdavh/Raft3DockerClusterExample example. Starting the cluster, then gracefully shutting down 1 non-leader node causes a massive spam of error messages: In this case shutdown 4097, where 4098 is leader:

raft3dockerclusterexample_7097 | info: Microsoft.Hosting.Lifetime[0] raft3dockerclusterexample_7097 | Application is shutting down... raft3dockerclusterexample_7098 | warn: DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster[75001] raft3dockerclusterexample_7098 | Cluster member https://raft3dockerclusterexample_7097/ is unavailable raft3dockerclusterexample_7098 | System.TimeoutException: A connection could not be established within the configured ConnectTimeout. raft3dockerclusterexample_7097 exited with code 0 raft3dockerclusterexample_7097 exited with code 0 raft3dockerclusterexample_7098 | warn: DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster[75001] raft3dockerclusterexample_7098 | Cluster member https://raft3dockerclusterexample_7097/ is unavailable raft3dockerclusterexample_7098 | System.TimeoutException: A connection could not be established within the configured ConnectTimeout.

The last 3 lines is repeated infinitely VERY VERY rapidly until 4097 is started again, where you get a different set of error messages:

raft3dockerclusterexample_7097 raft3dockerclusterexample_7097 | raft3dockerclusterexample_7098 raft3dockerclusterexample_7098 | raft3dockerclusterexample_7098 | raft3dockerclusterexample_7098 raft3dockerclusterexample_7098 | raft3dockerclusterexample_7098 | raft3dockerclusterexample_7098 raft3dockerclusterexample_7098 | raft3dockerclusterexample_7098 | raft3dockerclusterexample_7098 | raft3dockerclusterexample_7098 | raft3dockerclusterexample_7098 | raft3dockerclusterexample_7098 | raft3dockerclusterexample7098 | raft3dockerclusterexample7098 | raft3dockerclusterexample7098 | raft3dockerclusterexample7098 | raft3dockerclusterexample_7098 | raft3dockerclusterexample_7097 raft3dockerclusterexample_7097 | raft3dockerclusterexample_7097 | raft3dockerclusterexample_7097 | raft3dockerclusterexample_7097 | raft3dockerclusterexample_7097 | raft3dockerclusterexample_7097 | raft3dockerclusterexample_7097 | raft3dockerclusterexample_7097 | raft3dockerclusterexample_7097 | raft3dockerclusterexample_7097 | raft3dockerclusterexample_7097 | raft3dockerclusterexample_7097 | raft3dockerclusterexample_7097 | raft3dockerclusterexample_7097 | raft3dockerclusterexample_7097 | raft3dockerclusterexample7097 | raft3dockerclusterexample_7097 raft3dockerclusterexample_7097 | raft3dockerclusterexample_7098 raft3dockerclusterexample_7098 | raft3dockerclusterexample_7098 | raft3dockerclusterexample_7097 raft3dockerclusterexample_7097 | | info: Microsoft.Hosting.Lifetime[0] Content root path: C:\app | warn: DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster[75001] Cluster member https://raft3dockerclusterexample_7097/ is unavailable System.TimeoutException: A connection could not be established within the configured ConnectTimeout. | warn: DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster[75001] Cluster member https://raft3dockerclusterexample_7097/ is unavailable System.TimeoutException: A connection could not be established within the configured ConnectTimeout. | fail: Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddleware[1] An unhandled exception has occurred while executing the request. System.OperationCanceledException: The operation was canceled. at System.Threading.CancellationToken.ThrowOperationCanceledException() at System.Threading.CancellationToken.ThrowIfCancellationRequested() at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.HttpProtocol.Microsoft.AspNetCore.Http.Features.IHttpResponseBodyFeature.StartAsync(CancellationToken cancellationToken) at Microsoft.AspNetCore.Http.HttpResponseWritingExtensions.WriteAsync(HttpResponse response, String text, Encoding encoding, CancellationToken cancellationToken) at DotNext.Net.Cluster.Consensus.Raft.Http.HttpMessage.SaveResponseAsync[T](HttpResponse response, T result, CancellationToken token) in //src/cluster/DotNext.AspNetCore.Cluster/Net/Cluster/Consensus/Raft/Http/HttpMessage.cs:line 92 at DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpMessage.SaveResponseAsync[T](HttpResponse response, Result`1& result, CancellationToken token) in //src/cluster/DotNext.AspNetCore.Cluster/Net/Cluster/Consensus/Raft/Http/RaftHttpMessage.cs:line 66 at DotNext.Net.Cluster.Consensus.Raft.Http.PreVoteMessage.SaveResponseAsync(HttpResponse response, Result`1 result, CancellationToken token) in //src/cluster/DotNext.AspNetCore.Cluster/Net/Cluster/Consensus/Raft/Http/PreVoteMessage.cs:line 46 at DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster.PreVoteAsync(PreVoteMessage request, HttpResponse response, CancellationToken token) in //src/cluster/DotNext.AspNetCore.Cluster/Net/Cluster/Consensus/Raft/Http/RaftHttpCluster.Messaging.cs:line 283 at Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddlewareImpl.g__Awaited|8_0(ExceptionHandlerMiddlewareImpl middleware, HttpContext context, Task task) | warn: DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster[75001] Cluster member https://raft3dockerclusterexample_7098/ is unavailable System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: The I/O operation has been aborted because of either a thread exit or an application request.. ---> System.Net.Sockets.SocketException (995): The I/O operation has been aborted because of either a thread exit or an application request. --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.EnsureFullTlsFrameAsync[TIOAdapter](CancellationToken cancellationToken) at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder1.StateMachineBox1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Security.SslStream.ReadAsyncInternal[TIOAdapter](Memory1 buffer, CancellationToken cancellationToken) raft3dockerclusterexample_7097 | at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token) at System.Net.Http.HttpConnection.InitialFillAsync(Boolean async) at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, Boolean async, CancellationToken cancellationToken) --- End of inner exception stack trace --- at System.Net.Http.HttpClient.HandleFailure(Exception e, Boolean telemetryStarted, HttpResponseMessage response, CancellationTokenSource cts, CancellationToken cancellationToken, CancellationTokenSource pendingRequestsCts) at System.Net.Http.HttpClient.g__Core|83_0(HttpRequestMessage request, HttpCompletionOption completionOption, CancellationTokenSource cts, Boolean disposeCts, CancellationTokenSource pendingRequestsCts, CancellationToken originalCancellationToken) at DotNext.Net.Cluster.Consensus.Raft.Http.RaftClusterMember.SendAsync[TResponse,TMessage](TMessage message, CancellationToken token) in //src/cluster/DotNext.AspNetCore.Cluster/Net/Cluster/Consensus/Raft/Http/RaftClusterMember.cs:line 89 | info: DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster[74002] Transition to Candidate state has started with term 0 | warn: DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster[75001] Cluster member https://raft3dockerclusterexample_7097/ is unavailable System.TimeoutException: A connection could not be established within the configured ConnectTimeout. | info: DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster[74002] Transition to Candidate state has started with term 0

And now the last 5 lines are repeated VERY VERY rapidly forever.

IMHO, a graceful shutdown should notify the leader that it is actually shutting down and remove from cluster, and it should not spam the logs. And restarting the node should cleanly rejoin the cluster. An unexpected loss of non-leader node should also not cause a massive spam of log entries, it should do some exponential backoff to recheck for availability. And when node is available again, figure if it was loss of network between nodes or loss of state (crash) and cleanly rejoin the cluster after that.

sakno commented 1 year ago

It happens forever because node 7097 is not a leader anymore (due to loss of consensus). It downgrades itself to the Follower state, then wait for election timeout, then moves to Candidate state. As Candidate, it requests votes from other nodes. Due to connectivity issues (log indicates that ConnectTimeout is to small), node 7098 has not enough time to answer and the requester just aborts the connection as timed out. When next timeout occurred, node 7097 is trying to request votes again.

davhdavh commented 1 year ago

It happens forever because node 7097 is not a leader anymore

7097 was never leader?!? And there is still 7096 in the cluster so 7096 and 7098 should be enough to have consensus?

If I kill the leader it works just fine electing a new leader, but it still ends up in this scenario where the leader then complains about the machine that was lost.

log indicates that ConnectTimeout is to small

It's the default value, so 300 msec... Waaaay more than enough time to talk to each other. It might get the timeout on the very first request due to the new instance of 7097 starting up, but after that what is the problem? Also setting HttpClusterMemberConfiguration.LowerElectionTimeout to 5 sec and upper to 10 sec, and RequestTimeout to 5 sec, and SocketsHttpHandler.ConnectTimeout to 1sec gives the exact same behaviour, it just spams slower.

dotnet / dotNext

Joining then leaving a cluster triggers infinite candidate loop #110