dotnet / dotNext

Next generation API for .NET
https://dotnet.github.io/dotNext/
MIT License
1.6k stars 119 forks source link

BadHttpRequestException when multiple members try to join a cluster at once #102

Closed Arkensor closed 2 years ago

Arkensor commented 2 years ago

Hello,

I came across the exception below being thrown. It happens when multiple members try to join. I guess the leader is in a busy state while processing one member or gave a response the follower did not expect? I am not really sure what causes this. I have my own IPersistentState but that is not getting called here I think. The exception trace is from the cluster member configuration changes.

This does not result in a real failure. The client just retries again and is able to join afterwards, but I would like to get rid of this error in my consoles. If it is my fault, It would be lovely to have some pointers to what I need to fix, or how I am using something wrong maybe. I currently allow AddMemberAsync to be called multiple times in parallel (each from different http request scopes). Maybe that is the problem and I need to lock this down so only one request at a time is allowed to enter that code section at a time?


fail: Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddleware[1]
      An unhandled exception has occurred while executing the request.
      Microsoft.AspNetCore.Server.Kestrel.Core.BadHttpRequestException: Unexpected end of request content.
         at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Http.Http1ContentLengthMessageBody.ReadAsyncInternal(CancellationToken cancellationToken)
         at System.Runtime.CompilerServices.PoolingAsyncValueTaskMethodBuilder`1.StateMachineBox`1.System.Threading.Tasks.Sources.IValueTaskSource<TResult>.GetResult(Int16 token)
         at DotNext.IO.Pipelines.PipeExtensions.ReadAsync[TResult,TParser](PipeReader reader, TParser parser, CancellationToken token) in /_/src/DotNext.IO/IO/Pipelines/PipeExtensions.Readers.cs:line 65
         at DotNext.IO.Pipelines.PipeExtensions.<ReadBlockAsync>g__ReadBlockSlowAsync|26_0(PipeReader reader, Memory`1 output, CancellationToken token) in /_/src/DotNext.IO/IO/Pipelines/PipeExtensions.Readers.cs:line 556
         at DotNext.Net.Cluster.Consensus.Raft.Http.RaftHttpCluster.AppendEntriesAsync(HttpRequest request, HttpResponse response, CancellationToken token) in /_/src/cluster/DotNext.AspNetCore.Cluster/Net/Cluster/Consensus/Raft/Http/RaftHttpCluster.Messaging.cs:line 260
         at Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddleware.<Invoke>g__Awaited|6_0(ExceptionHandlerMiddleware middleware, HttpContext context, Task task)```

Thanks!
sakno commented 2 years ago

You can add only one member at a time. This is by design. Multiple invocations of AddMemberAsync can lead to unpredictable results. You can protect your endpoint with some of exclusive lock to avoid such a kind of issues.

Arkensor commented 2 years ago

Ah so it is implemented as proposed by the original paper with one change at a time and not with an "optimized" process like this? https://eileen-code4fun.medium.com/raft-cluster-membership-change-protocol-f57cc17d1c03 Maybe that could be worth looking at in the future. I was going to follow that idea if I were to make my own implementation but I came across this library so I did not pursue that further.

sakno commented 2 years ago

At the moment I have no plans to implement multiple membership changes at a time. Feel free to contribute this change.

sakno commented 2 years ago

Related #108. This bug is happening because of concurrent issues with persistent configuration.

sakno commented 2 years ago

The bug that causes BadHttpRequestException is fixed in 4.5.0 and already available on NuGet. Attempt to concurrently add/remove members now explicitly raises an exception.