asynkron / protoactor-dotnet

Proto Actor - Ultra fast distributed actors for Go, C# and Java/Kotlin
http://proto.actor
Apache License 2.0
1.68k stars 284 forks source link

SeedNodeClusterProvider: Consensus not reached, Initiating rebalance [1.5.0] #2088

Closed AqlaSolutions closed 4 months ago

AqlaSolutions commented 4 months ago

Two nodes.

Seed on localhost:7757:

[18:47:58.685 INF] Proto.Diagnostics.DiagnosticsStore [Diagnostics] "Environment": "Settings": "{\"cpuCount\":12,\"dotnetVersion\":\"7.0.15\",\"platform\":2,\"platformVersion\":\"Microsoft Windows NT 10.0.19045.0\",\"minWorke
rThreads\":12,\"minCompletionPortThreads\":1,\"availableWorkerThreads\":32767,\"availableCompletionPortThreads\":1000}"
[18:47:58.747 INF] Proto.Diagnostics.DiagnosticsStore [Diagnostics] "ActorSystem": "Config": "{\"DeadLetterThrottleInterval\":\"00:00:00\",\"DeadLetterThrottleCount\":0,\"DeadLetterRequestLogging\":true,\"DeveloperSupervision
Logging\":true,\"MetricsEnabled\":true,\"SharedFutures\":false,\"SharedFutureSize\":5000,\"ThreadPoolStatsTimeout\":\"00:00:01\",\"DeveloperThreadPoolStatsLogging\":true,\"ActorRequestTimeout\":\"00:10:00\",\"DeadLetterRespon
seLogging\":false,\"DiagnosticsLogLevel\":2}"
[18:47:58.759 INF] Proto.Diagnostics.DiagnosticsStore [Diagnostics] "ActorSystem": "Id": "\"43d5305c78da47e7bc83e6a26f85a833\""
[18:47:58.767 INF] Proto.Diagnostics.DiagnosticsStore [Diagnostics] "ActorSystem": "Extension enabled IRemote"      
[18:47:58.767 INF] Proto.Diagnostics.DiagnosticsStore [Diagnostics] "ActorSystem": "Extension enabled Serialization"
[18:47:58.769 INF] Proto.Diagnostics.DiagnosticsStore [Diagnostics] "ActorSystem": "Extension enabled Cluster"      
[18:47:58.826 INF] Proto.Diagnostics.DiagnosticsStore [Diagnostics] "ActorSystem": "Extension enabled PubSubExtension"
[18:47:58.829 INF] Proto.Diagnostics.DiagnosticsStore [Diagnostics] "ActorSystem": "Extension enabled DIExtension"  
...
[18:48:01.075 INF] Proto.Cluster.ProtoActorLifecycleHost [ProtoActorLifecycleHost]"43d5305c78da47e7bc83e6a26f85a833" Starting Proto.Actor cluster member
[18:48:01.227 INF] Proto.Remote.GrpcNet.GrpcNetRemote Starting Proto.Actor server on "localhost":7757 ("localhost:7757")
[18:48:01.227 INF] Proto.Diagnostics.DiagnosticsStore [Diagnostics] "Remote": "Started GrpcNet Successfully"
[18:48:01.262 INF] Proto.Diagnostics.DiagnosticsStore [Diagnostics] "Remote": "Config": "{\"UseHttps\":false,\"Host\":\"localhost\",\"Port\":7757,\"AdvertisedHost\":null,\"AdvertisedPort\":null,\"DeserializationErrorLogLevel\
":4,\"WaitAfterEndpointTerminationTimeSpan\":\"00:00:03\",\"RemoteDiagnostics\":true}"
...
[18:48:01.337 INF] Proto.Cluster.Gossip.Gossiper Started Cluster Gossip
[18:48:01.341 INF] Proto.Cluster.Gossip.Gossiper Gossip is waiting for cluster to join
[18:48:01.379 INF] Proto.Cluster.Seed.SeedNodeActor Started SeedNodeActor
[18:48:01.430 INF] Proto.Cluster.MemberList [MemberList] Cluster members joined ["{ \"host\": \"localhost\", \"port\": 7757, \"id\": \"43d5305c78da47e7bc83e6a26f85a833\", \"kinds\": ...
[18:48:01.455 INF] Proto.Cluster.Gossip.Gossiper Starting gossip loop
[18:48:01.459 INF] Proto.Cluster.Seed.SeedNodeActor Starting via SeedNode, found seed nodes []
[18:48:01.482 INF] Proto.Cluster.Seed.SeedNodeClusterProvider Connected to seed node "localhost:7757"
[18:48:01.483 INF] Proto.Cluster.Cluster Started as cluster member
[18:48:01.483 INF] Proto.Cluster.Cluster I see myself
[18:48:01.484 INF] Proto.Diagnostics.DiagnosticsStore [Diagnostics] "Cluster": "Started Member Successfully"
[18:48:04.924 INF] Proto.Remote.EndpointReader [EndpointReader]["localhost:7757"] Accepted connection request from "ipv4:127.0.0.1:65315" to "localhost:7757"
[18:48:04.962 INF] Proto.Remote.ServerConnector [ServerConnector]["localhost:7757"] Connecting to "localhost:7752"
[18:48:05.011 INF] Proto.Cluster.Seed.SeedNodeActor Got JoinRequest from Member "{ \"host\": \"localhost\", \"port\": 7752, \"id\": \"e0e0b13e0d8344c097dc49584715e4ab\", \"kinds\": [ \"prototopic\" ] }"
[18:48:05.013 INF] Proto.Cluster.MemberList [MemberList] Cluster members joined ["{ \"host\": \"localhost\", \"port\": 7752, \"id\": \"e0e0b13e0d8344c097dc49584715e4ab\", \"kinds\": [ \"prototopic\" ] }"]
[18:48:07.231 INF] Proto.Remote.ServerConnector [ServerConnector]["localhost:7757"] Connected to "localhost:7752"
[18:48:15.034 WRN] Proto.Cluster.Partition.PartitionIdentityActor [PartitionIdentity] "43d5305c78da47e7bc83e6a26f85a833" Consensus not reached, Initiating rebalance:, 302903068 0 after 00:00:10.0159737
[and nothing more]

Member without cluster kinds on localhost:7752:

[18:48:02.007 INF] Proto.Diagnostics.DiagnosticsStore [Diagnostics] "Environment": "Settings": "{\"cpuCount\":12,\"dotnetVersion\":\"7.0.15\",\"platform\":2,\"platformVersion\":\"Microsoft Windows NT 10.0.19045.0\",\"minWorke
rThreads\":12,\"minCompletionPortThreads\":1,\"availableWorkerThreads\":32767,\"availableCompletionPortThreads\":1000}"
[18:48:02.059 INF] Proto.Diagnostics.DiagnosticsStore [Diagnostics] "ActorSystem": "Config": "{\"DeadLetterThrottleInterval\":\"00:00:00\",\"DeadLetterThrottleCount\":0,\"DeadLetterRequestLogging\":true,\"DeveloperSupervision
Logging\":true,\"MetricsEnabled\":true,\"SharedFutures\":false,\"SharedFutureSize\":5000,\"ThreadPoolStatsTimeout\":\"00:00:01\",\"DeveloperThreadPoolStatsLogging\":true,\"ActorRequestTimeout\":\"00:10:00\",\"DeadLetterRespon
seLogging\":false,\"DiagnosticsLogLevel\":2}"
[18:48:02.085 INF] Proto.Diagnostics.DiagnosticsStore [Diagnostics] "ActorSystem": "Id": "\"e0e0b13e0d8344c097dc49584715e4ab\""
[18:48:02.095 INF] Proto.Diagnostics.DiagnosticsStore [Diagnostics] "ActorSystem": "Extension enabled IRemote"                 
[18:48:02.096 INF] Proto.Diagnostics.DiagnosticsStore [Diagnostics] "ActorSystem": "Extension enabled Serialization"           
[18:48:02.097 INF] Proto.Diagnostics.DiagnosticsStore [Diagnostics] "ActorSystem": "Extension enabled Cluster"                 
[18:48:02.161 INF] Proto.Diagnostics.DiagnosticsStore [Diagnostics] "ActorSystem": "Extension enabled PubSubExtension"                                  
[18:48:02.167 INF] Proto.Diagnostics.DiagnosticsStore [Diagnostics] "ActorSystem": "Extension enabled DIExtension"                                      
[18:48:02.173 INF] Proto.Cluster.ProtoActorLifecycleHost [ProtoActorLifecycleHost]"e0e0b13e0d8344c097dc49584715e4ab" Starting Proto.Actor cluster member
[18:48:02.315 INF] Proto.Remote.GrpcNet.GrpcNetRemote Starting Proto.Actor server on "localhost":7752 ("localhost:7752")
[18:48:02.315 INF] Proto.Diagnostics.DiagnosticsStore [Diagnostics] "Remote": "Started GrpcNet Successfully"            
[18:48:02.346 INF] Proto.Diagnostics.DiagnosticsStore [Diagnostics] "Remote": "Config": "{\"UseHttps\":false,\"Host\":\"localhost\",\"Port\":7752,\"AdvertisedHost\":null,\"AdvertisedPort\":null,\"DeserializationErrorLogLevel\
":4,\"WaitAfterEndpointTerminationTimeSpan\":\"00:00:03\",\"RemoteDiagnostics\":true}"
...
[18:48:02.354 INF] Proto.Cluster.Cluster Starting
[18:48:02.399 INF] Proto.Diagnostics.DiagnosticsStore [Diagnostics] "Cluster": "Config": "{\"ClusterName\":\"Name\",\"GossipDebugLogging\":false,\"ClusterKinds\":[],\"GossipInterval\":\"00:00:00.3000000\",\"GossipRequestTimeou
t\":\"00:05:00\",\"HeartbeatExpiration\":\"01:00:00\",\"ActorRequestTimeout\":\"00:05:00\",\"ActorSpawnVerificationTimeout\":\"00:05:00\",\"ActorActivationTimeout\":\"00:05:00\",\"RequestLogThrottlePeriod\":\"00:00:02\",\"Max
NumberOfEventsInRequestLogThrottlePeriod\":3,\"GossipFanout\":3,\"GossipMaxSend\":50,\"ClusterRequestDeDuplicationWindow\":\"00:01:00\",\"RemotePidCacheTimeToLive\":\"00:15:00\",\"RemotePidCacheClearInterval\":\"00:00:15\",\"
PubSubConfig\":{\"SubscriberTimeout\":\"00:00:05\"},\"LegacyRequestTimeoutBehavior\":false,\"ExitOnShutdown\":false}"
[18:48:02.399 INF] Proto.Cluster.Gossip.Gossiper Started Cluster Gossip
[18:48:02.403 INF] Proto.Cluster.Gossip.Gossiper Gossip is waiting for cluster to join
[18:48:02.436 INF] Proto.Cluster.Seed.SeedNodeActor Started SeedNodeActor
[18:48:02.480 INF] Proto.Cluster.MemberList [MemberList] Cluster members joined ["{ \"host\": \"localhost\", \"port\": 7752, \"id\": \"e0e0b13e0d8344c097dc49584715e4ab\", \"kinds\": [ \"prototopic\" ] }"]
[18:48:02.503 INF] Proto.Cluster.Gossip.Gossiper Starting gossip loop
[18:48:02.510 INF] Proto.Cluster.Seed.SeedNodeActor Starting via SeedNode, found seed nodes [["localhost", 7757]]
[18:48:02.529 INF] Proto.Cluster.Seed.SeedNodeActor Creating pid...
[18:48:02.537 INF] Proto.Cluster.Seed.SeedNodeActor Trying to join seed node "localhost:7757/$server_seed"
[18:48:02.550 INF] Proto.Remote.ServerConnector [ServerConnector]["localhost:7752"] Connecting to "localhost:7757"
[18:48:02.660 WRN] Proto.SystemActors Actor "localhost:7752/$server_seed" deadline 00:00:00.1000000, exceeded on message "Connect { }"
[18:48:04.976 INF] Proto.Remote.ServerConnector [ServerConnector]["localhost:7752"] Connected to "localhost:7757"
[18:48:07.188 INF] Proto.Remote.EndpointReader [EndpointReader]["localhost:7752"] Accepted connection request from "ipv4:127.0.0.1:65320" to "localhost:7752"
[18:48:07.248 INF] Proto.Cluster.Seed.SeedNodeClusterProvider Connected to seed node "localhost:7757"
[18:48:07.248 INF] Proto.Cluster.Cluster Started as cluster member
[18:48:07.248 INF] Proto.Cluster.Cluster I see myself
[18:48:07.249 INF] Proto.Diagnostics.DiagnosticsStore [Diagnostics] "Cluster": "Started Member Successfully"
[18:48:07.252 INF] Proto.Cluster.Gossip.GossipActor Ignoring gossip request from "43d5305c78da47e7bc83e6a26f85a833" as it is not a member
[18:48:07.252 INF] Proto.Cluster.Gossip.GossipActor Ignoring gossip request from "43d5305c78da47e7bc83e6a26f85a833" as it is not a member
[18:48:07.252 INF] Proto.Cluster.Gossip.GossipActor Ignoring gossip request from "43d5305c78da47e7bc83e6a26f85a833" as it is not a member
[18:48:07.252 INF] Proto.Cluster.Gossip.GossipActor Ignoring gossip request from "43d5305c78da47e7bc83e6a26f85a833" as it is not a member
[18:48:07.252 INF] Proto.Cluster.Gossip.GossipActor Ignoring gossip request from "43d5305c78da47e7bc83e6a26f85a833" as it is not a member
[18:48:07.253 INF] Proto.Cluster.Gossip.GossipActor Ignoring gossip request from "43d5305c78da47e7bc83e6a26f85a833" as it is not a member
[18:48:07.253 INF] Proto.Cluster.Gossip.GossipActor Ignoring gossip request from "43d5305c78da47e7bc83e6a26f85a833" as it is not a member
[18:48:07.437 INF] Proto.Cluster.Gossip.GossipActor Ignoring gossip request from "43d5305c78da47e7bc83e6a26f85a833" as it is not a member
[and repeating]

Any help please?

Can it be caused by different cluster kinds list? But I need it to be different because some grains should only be started on specific nodes. And it worked previously with Consul.

I also tried setting RunAsClient on the second node but then it prints IndexOutOfBoundsException on that node:

[19:16:25.816 INF] Proto.Remote.EndpointReader [EndpointReader]["localhost:7752"
] Accepted connection request from "ipv4:127.0.0.1:63469" to "localhost:7752"   
[19:16:25.873 ERR] Grpc.AspNetCore.Server.ServerCallHandler Error when executing
 service method '"Receive"'.
System.IndexOutOfRangeException: Index was outside the bounds of the array.     
   at Proto.Remote.PidExtensions.TryTranslateToLocalClientPID(PID pid, PID& clie
ntPid)
   at Proto.Remote.RemoteMessageHandler.HandleRemoteMessage(RemoteMessage curren
tMessage, String remoteAddress)
   at Proto.Remote.EndpointReader.RunReader(IAsyncStreamReader`1 requestStream, 
String address, CancellationTokenSource cancellationTokenSource, String systemId
)
ntPid)
   at Proto.Remote.RemoteMessageHandler.HandleRemoteMessage(RemoteMessage currentMessage, String remoteAddress)
   at Proto.Remote.EndpointReader.RunReader(IAsyncStreamReader`1 requestStream, String address, CancellationTokenSource cancellationTokenSource, String systemId)
   at Proto.Remote.EndpointReader.Receive(IAsyncStreamReader`1 requestStream, IServerStreamWriter`1 responseStream, ServerCallContext context)
   at Proto.Remote.EndpointReader.Receive(IAsyncStreamReader`1 requestStream, IServerStreamWriter`1 responseStream, ServerCallContext context)
   at Grpc.Shared.Server.DuplexStreamingServerMethodInvoker`3.Invoke(HttpContext httpContext, ServerCallContext serverCallContext, IAsyncStreamReader`1 requestStream, IServerStreamWriter`1 responseStream)
   at Grpc.Shared.Server.DuplexStreamingServerMethodInvoker`3.Invoke(HttpContext httpContext, ServerCallContext serverCallContext, IAsyncStreamReader`1 requestStream, IServerStreamWriter`1 responseStream)
   at Grpc.AspNetCore.Server.Internal.CallHandlers.DuplexStreamingServerCallHandler`3.HandleCallAsyncCore(HttpContext httpContext, HttpContextServerCallContext serverCallContext)
   at Grpc.AspNetCore.Server.Internal.CallHandlers.ServerCallHandlerBase`3.<HandleCallAsync>g__AwaitHandleCall|8_0(HttpContextServerCallContext serverCallContext, Method`2 method, Task handleCall)
[19:16:28.478 INF] Proto.Remote.EndpointReader [EndpointReader]["localhost:7752"] Accepted connection request from "ipv4:127.0.0.1:63473" to "localhost:7752"
rogeralsing commented 4 months ago

https://github.com/asynkron/protoactor-dotnet/issues/2060

rogeralsing commented 4 months ago

closing in favor for #2060