Open christallire opened 1 year ago
@ReubenBond could you please have this a look?
LocalSiloHealthMonitor is not supposed to kill the silo. It's used to warn you about issues (connectivity, thread pool, etc) and prevent unhealthy silos from voting healthy silos out of the cluster.
Silos are only kicked out of the cluster by other silos, after a number of consecutively failed probes.
In your case, are you trying a rolling upgrade from 3.x to 7.0? How many silos are in the cluster?
I have approx. 40 silos.
Silos are only kicked out of the cluster by other silos, after a number of consecutively failed probes.
Oh really? I thought it crashed in 3.0 because I had never had this issue before because the pod just silently restarted.
So, according to your message, the silo probably stopped receiving ping because the silo stopped for a period of time for whatever reasons (GC, high utilization, bugs) and got stuck there. hmm.
The logs will provide more insight into what's happening. We can help you to diagnose the issue using logs.
Generally, silos learn they have been evicted by reading the membership table. Upon seeing that they have been evicted, the silo process will crash itself via Environment.FailFast
.
In this case, your silo has been marked dead (Status = 6 and you see the silo which evicted it listed there) but possibly has not yet refreshed its membership to learn of that fact. Perhaps the entire process has locked up for some reason. Logs and possibly a memory dump would help to identify what's actually happening. If the host process has completely frozen (which may not be the case here), then no code running in the process will be able to terminate it. In that case, the Kubernetes hosting package can allow other silos to delete the silo's pod from Kubernetes once it's been evicted from the cluster and/or you can have a local Kubernetes liveness probe return a simple 200 OK to ensure the process is actually alive.
Logs are the first avenue to investigate.
Okay, I've managed to narrow it down and this is very interesting.
It seems Environment.FailFast doesn't work.
Here's what I did.
1) Found out Environment.FailFast in FatalErrorHandler.cs doesn't work.
because FATAL EXCEPTION from ...
log printed on the console but other threads still running (especially for the LocalSiloHealthMonitor
, it is keep spammed after Environment.FailFast
)
2) Implemented my own FatalErrorHandler to see what is exactly wrong, like below:
// Allow some time for loggers to flush.
Console.Error.WriteLine("FATAL EXCEPTION: BEFORE SLEEP");
Thread.Sleep(2000);
Console.Error.WriteLine("FATAL EXCEPTION: AFTER SLEEP");
if (Debugger.IsAttached) Debugger.Break();
Console.Error.WriteLine("FATAL EXCEPTION: BEFORE FAIL FAST");
Environment.FailFast(msg, exception);
Console.Error.WriteLine("FATAL EXCEPTION: AM I STILL ALIVE?");
and result were same, got FATAL EXCEPTION: BEFORE FAIL FAST
but not FATAL EXCEPTION: AM I STILL ALIVE?
.
$ kubectl logs -f account-service-5b8fcc48c9-cqzvk app | grep FATAL
FATAL ERROR HANDLER INITIATED.
FATAL EXCEPTION from Orleans.Runtime.MembershipService.MembershipTableManager. Context: I have been told I am dead, so this silo will stop! Reason: I should be Dead according to membership table (in CleanupTableEntries): entry = [SiloAddress=S10.0.15.87:11111:32263975 SiloName=account-service-5b8fcc48c9-cqzvk Status=Dead HostName=account-service-5b8fcc48c9-cqzvk ProxyPort=30000 RoleName= UpdateZone=0 FaultZone=0 StartTime=2023-01-09 10:12:57.437 GMT IAmAliveTime=2023-01-09 10:13:07.723 GMT Suspecters=[S10.0.9.100:11111:32264074] SuspectTimes=[2023-01-09 10:14:36.461 GMT]].. Exception: null.\nCurrent stack: at System.Environment.get_StackTrace()
FATAL EXCEPTION: BEFORE SLEEP
FATAL EXCEPTION: AFTER SLEEP
FATAL EXCEPTION: BEFORE FAIL FAST
Process terminated. FATAL EXCEPTION from Orleans.Runtime.MembershipService.MembershipTableManager. Context: I have been told I am dead, so this silo will stop! Reason: I should be Dead according to membership table (in CleanupTableEntries): entry = [SiloAddress=S10.0.15.87:11111:32263975 SiloName=account-service-5b8fcc48c9-cqzvk Status=Dead HostName=account-service-5b8fcc48c9-cqzvk ProxyPort=30000 RoleName= UpdateZone=0 FaultZone=0 StartTime=2023-01-09 10:12:57.437 GMT IAmAliveTime=2023-01-09 10:13:07.723 GMT Suspecters=[S10.0.9.100:11111:32264074] SuspectTimes=[2023-01-09 10:14:36.461 GMT]].. Exception: null.\nCurrent stack: at System.Environment.get_StackTrace()
(I get CleanupTableEntries and also other two kinds of dead messages)
but still spams the log
[10:17:27 ERR] Could not deliver reminder tick for [optimizationReminder, productoptionoptimization/157, 00:30:00, 2023-01-09 06:17:27.783 GMT, 1780, 3695, Ticking], next 01/09/2023 10:47:27.
Orleans.Runtime.OrleansMessageRejectionException: Exception while sending message: Orleans.Runtime.Messaging.ConnectionFailedException: Unable to connect to S10.0.9.100:11111:32264074, will retry after 198.1379ms
at Orleans.Runtime.Messaging.ConnectionManager.GetConnectionAsync(SiloAddress endpoint) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 108
at Orleans.Runtime.Messaging.MessageCenter.<SendMessage>g__SendAsync|30_0(MessageCenter messageCenter, ValueTask`1 connectionTask, Message msg) in /_/src/Orleans.Runtime/Messaging/MessageCenter.cs:line 231
at Orleans.Serialization.Invocation.ResponseCompletionSource.GetResult(Int16 token) in /_/src/Orleans.Serialization/Invocation/ResponseCompletionSource.cs:line 90
at Orleans.Runtime.OutgoingCallInvoker`1.Invoke() in /_/src/Orleans.Core/Runtime/OutgoingCallInvoker.cs:line 129
at Orleans.Runtime.ActivityPropagationGrainCallFilter.Process(IGrainCallContext context, Activity activity) in /_/src/Orleans.Core/Diagnostics/ActivityPropagationGrainCallFilter.cs:line 75
at Orleans.Runtime.OutgoingCallInvoker`1.Invoke() in /_/src/Orleans.Core/Runtime/OutgoingCallInvoker.cs:line 129
at Orleans.Runtime.GrainReferenceRuntime.InvokeMethodWithFiltersAsync[TResult](GrainReference reference, IInvokable request, InvokeMethodOptions options) in /_/src/Orleans.Core/Runtime/GrainReferenceRuntime.cs:line 76
at Orleans.Runtime.GrainDirectory.LocalGrainDirectory.LookupAsync(GrainId grainId, Int32 hopCount) in /_/src/Orleans.Runtime/GrainDirectory/LocalGrainDirectory.cs:line 739
at Orleans.Runtime.GrainDirectory.DhtGrainLocator.Lookup(GrainId grainId) in /_/src/Orleans.Runtime/GrainDirectory/DhtGrainLocator.cs:line 30
at Orleans.Runtime.Placement.PlacementService.PlacementWorker.GetOrPlaceActivationAsync(Message firstMessage) in /_/src/Orleans.Runtime/Placement/PlacementService.cs:line 357
at Orleans.Runtime.Messaging.MessageCenter.<AddressAndSendMessage>g__SendMessageAsync|40_0(Task addressMessageTask, Message m) in /_/src/Orleans.Runtime/Messaging/MessageCenter.cs:line 448
at Orleans.Serialization.Invocation.ResponseCompletionSource.GetResult(Int16 token) in /_/src/Orleans.Serialization/Invocation/ResponseCompletionSource.cs:line 90
at Orleans.Runtime.OutgoingCallInvoker`1.Invoke() in /_/src/Orleans.Core/Runtime/OutgoingCallInvoker.cs:line 129
at Orleans.Runtime.ActivityPropagationGrainCallFilter.Process(IGrainCallContext context, Activity activity) in /_/src/Orleans.Core/Diagnostics/ActivityPropagationGrainCallFilter.cs:line 75
at Orleans.Runtime.OutgoingCallInvoker`1.Invoke() in /_/src/Orleans.Core/Runtime/OutgoingCallInvoker.cs:line 129
at Orleans.Runtime.GrainReferenceRuntime.InvokeMethodWithFiltersAsync(GrainReference reference, IInvokable request, InvokeMethodOptions options) in /_/src/Orleans.Core/Runtime/GrainReferenceRuntime.cs:line 83
at Orleans.Runtime.ReminderService.LocalReminderService.LocalReminderData.OnTimerTick() in /_/src/Orleans.Reminders/ReminderService/LocalReminderService.cs:line 714
or
[10:18:58 WRN] This silo is not active (Status: Dead) and is therefore not healthy.
[10:18:58 WRN] Self-monitoring determined that local health is degraded. Degradation score is 8/8 (lower is better). Complaints: This silo is not active (Status: Dead and is therefore not healthy.
[10:19:01 INF] Establishing connection to endpoint S10.0.9.100:11111:32264074
[10:19:01 INF] Establishing connection to endpoint S10.0.9.48:11111:32264076
or
[10:40:17 WRN] Error retrieving silo manifest for silo S10.0.9.48:11111:32264076
Orleans.Runtime.OrleansMessageRejectionException: Exception while sending message: Orleans.Runtime.Messaging.ConnectionFailedException: Unable to connect to endpoint S10.0.9.48:11111:32264076. See InnerException
---> Orleans.Networking.Shared.SocketConnectionException: Unable to connect to 10.0.9.48:11111. Error: HostUnreachable
at Orleans.Networking.Shared.SocketConnectionFactory.ConnectAsync(EndPoint endpoint, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/Shared/SocketConnectionFactory.cs:line 61
at Orleans.Runtime.Messaging.ConnectionFactory.ConnectAsync(SiloAddress address, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/ConnectionFactory.cs:line 64
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
--- End of inner exception stack trace ---
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
at Orleans.Runtime.Messaging.ConnectionManager.GetConnectionAsync(SiloAddress endpoint) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 108
at Orleans.Runtime.Messaging.MessageCenter.<SendMessage>g__SendAsync|30_0(MessageCenter messageCenter, ValueTask`1 connectionTask, Message msg) in /_/src/Orleans.Runtime/Messaging/MessageCenter.cs:line 231
at Orleans.Serialization.Invocation.ResponseCompletionSource.GetResult(Int16 token) in /_/src/Orleans.Serialization/Invocation/ResponseCompletionSource.cs:line 90
at Orleans.Runtime.OutgoingCallInvoker`1.Invoke() in /_/src/Orleans.Core/Runtime/OutgoingCallInvoker.cs:line 129
at Orleans.Runtime.ActivityPropagationGrainCallFilter.Process(IGrainCallContext context, Activity activity) in /_/src/Orleans.Core/Diagnostics/ActivityPropagationGrainCallFilter.cs:line 75
at Orleans.Runtime.OutgoingCallInvoker`1.Invoke() in /_/src/Orleans.Core/Runtime/OutgoingCallInvoker.cs:line 129
at Orleans.Runtime.GrainReferenceRuntime.InvokeMethodWithFiltersAsync[TResult](GrainReference reference, IInvokable request, InvokeMethodOptions options) in /_/src/Orleans.Core/Runtime/GrainReferenceRuntime.cs:line 76
at Orleans.Runtime.Metadata.ClusterManifestProvider.<>c__DisplayClass18_0.<<UpdateManifest>g__GetManifest|0>d.MoveNext() in /_/src/Orleans.Runtime/Manifest/ClusterManifestProvider.cs:line 163
[10:40:19 WRN] This silo is not active (Status: Dead) and is therefore not healthy.
[10:40:19 WRN] Self-monitoring determined that local health is degraded. Degradation score is 8/8 (lower is better). Complaints: This silo is not active (Status: Dead and is therefore not healthy.
[10:40:21 INF] Application is shutting down...
[10:40:21 INF] Stopping Orleans Silo
[10:40:21 INF] Stopping Orleans.Runtime.ReminderService.LocalReminderService grain service
[10:40:22 INF] Establishing connection to endpoint S10.0.9.100:11111:32264074
[10:40:22 INF] Establishing connection to endpoint S10.0.9.48:11111:32264076
[10:40:26 WRN] Connection attempt to endpoint S10.0.9.48:11111:32264076 failed
Orleans.Networking.Shared.SocketConnectionException: Unable to connect to 10.0.9.48:11111. Error: HostUnreachable
at Orleans.Networking.Shared.SocketConnectionFactory.ConnectAsync(EndPoint endpoint, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/Shared/SocketConnectionFactory.cs:line 61
at Orleans.Runtime.Messaging.ConnectionFactory.ConnectAsync(SiloAddress address, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/ConnectionFactory.cs:line 64
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
[10:40:26 WRN] Connection attempt to endpoint S10.0.9.100:11111:32264074 failed
Orleans.Networking.Shared.SocketConnectionException: Unable to connect to 10.0.9.100:11111. Error: HostUnreachable
at Orleans.Networking.Shared.SocketConnectionFactory.ConnectAsync(EndPoint endpoint, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/Shared/SocketConnectionFactory.cs:line 61
at Orleans.Runtime.Messaging.ConnectionFactory.ConnectAsync(SiloAddress address, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/ConnectionFactory.cs:line 64
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
[10:40:26 WRN] Error retrieving silo manifest for silo S10.0.9.100:11111:32264074
Orleans.Runtime.OrleansMessageRejectionException: Exception while sending message: Orleans.Runtime.Messaging.ConnectionFailedException: Unable to connect to endpoint S10.0.9.100:11111:32264074. See InnerException
---> Orleans.Networking.Shared.SocketConnectionException: Unable to connect to 10.0.9.100:11111. Error: HostUnreachable
at Orleans.Networking.Shared.SocketConnectionFactory.ConnectAsync(EndPoint endpoint, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/Shared/SocketConnectionFactory.cs:line 61
at Orleans.Runtime.Messaging.ConnectionFactory.ConnectAsync(SiloAddress address, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/ConnectionFactory.cs:line 64
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
--- End of inner exception stack trace ---
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
at Orleans.Runtime.Messaging.ConnectionManager.GetConnectionAsync(SiloAddress endpoint) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 108
at Orleans.Runtime.Messaging.MessageCenter.<SendMessage>g__SendAsync|30_0(MessageCenter messageCenter, ValueTask`1 connectionTask, Message msg) in /_/src/Orleans.Runtime/Messaging/MessageCenter.cs:line 231
at Orleans.Serialization.Invocation.ResponseCompletionSource.GetResult(Int16 token) in /_/src/Orleans.Serialization/Invocation/ResponseCompletionSource.cs:line 90
at Orleans.Runtime.OutgoingCallInvoker`1.Invoke() in /_/src/Orleans.Core/Runtime/OutgoingCallInvoker.cs:line 129
at Orleans.Runtime.ActivityPropagationGrainCallFilter.Process(IGrainCallContext context, Activity activity) in /_/src/Orleans.Core/Diagnostics/ActivityPropagationGrainCallFilter.cs:line 75
at Orleans.Runtime.OutgoingCallInvoker`1.Invoke() in /_/src/Orleans.Core/Runtime/OutgoingCallInvoker.cs:line 129
at Orleans.Runtime.GrainReferenceRuntime.InvokeMethodWithFiltersAsync[TResult](GrainReference reference, IInvokable request, InvokeMethodOptions options) in /_/src/Orleans.Core/Runtime/GrainReferenceRuntime.cs:line 76
at Orleans.Runtime.Metadata.ClusterManifestProvider.<>c__DisplayClass18_0.<<UpdateManifest>g__GetManifest|0>d.MoveNext() in /_/src/Orleans.Runtime/Manifest/ClusterManifestProvider.cs:line 163
[10:40:26 WRN] Error retrieving silo manifest for silo S10.0.9.48:11111:32264076
Orleans.Runtime.OrleansMessageRejectionException: Exception while sending message: Orleans.Runtime.Messaging.ConnectionFailedException: Unable to connect to endpoint S10.0.9.48:11111:32264076. See InnerException
---> Orleans.Networking.Shared.SocketConnectionException: Unable to connect to 10.0.9.48:11111. Error: HostUnreachable
at Orleans.Networking.Shared.SocketConnectionFactory.ConnectAsync(EndPoint endpoint, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/Shared/SocketConnectionFactory.cs:line 61
at Orleans.Runtime.Messaging.ConnectionFactory.ConnectAsync(SiloAddress address, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/ConnectionFactory.cs:line 64
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
--- End of inner exception stack trace ---
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
at Orleans.Runtime.Messaging.ConnectionManager.GetConnectionAsync(SiloAddress endpoint) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 108
at Orleans.Runtime.Messaging.MessageCenter.<SendMessage>g__SendAsync|30_0(MessageCenter messageCenter, ValueTask`1 connectionTask, Message msg) in /_/src/Orleans.Runtime/Messaging/MessageCenter.cs:line 231
at Orleans.Serialization.Invocation.ResponseCompletionSource.GetResult(Int16 token) in /_/src/Orleans.Serialization/Invocation/ResponseCompletionSource.cs:line 90
at Orleans.Runtime.OutgoingCallInvoker`1.Invoke() in /_/src/Orleans.Core/Runtime/OutgoingCallInvoker.cs:line 129
at Orleans.Runtime.ActivityPropagationGrainCallFilter.Process(IGrainCallContext context, Activity activity) in /_/src/Orleans.Core/Diagnostics/ActivityPropagationGrainCallFilter.cs:line 75
at Orleans.Runtime.OutgoingCallInvoker`1.Invoke() in /_/src/Orleans.Core/Runtime/OutgoingCallInvoker.cs:line 129
at Orleans.Runtime.GrainReferenceRuntime.InvokeMethodWithFiltersAsync[TResult](GrainReference reference, IInvokable request, InvokeMethodOptions options) in /_/src/Orleans.Core/Runtime/GrainReferenceRuntime.cs:line 76
at Orleans.Runtime.Metadata.ClusterManifestProvider.<>c__DisplayClass18_0.<<UpdateManifest>g__GetManifest|0>d.MoveNext() in /_/src/Orleans.Runtime/Manifest/ClusterManifestProvider.cs:line 163
[10:40:26 WRN] I should be Dead according to membership table (in TryUpdateMyStatusGlobalOnce): Entry = [SiloAddress=S10.0.15.87:11111:32263975 SiloName=account-service-5b8fcc48c9-cqzvk Status=Dead HostName=account-service-5b8fcc48c9-cqzvk ProxyPort=30000 RoleName= UpdateZone=0 FaultZone=0 StartTime=2023-01-09 10:12:57.437 GMT IAmAliveTime=2023-01-09 10:13:07.723 GMT Suspecters=[S10.0.9.100:11111:32264074] SuspectTimes=[2023-01-09 10:14:36.461 GMT]].
[10:40:26 ERR] I have been told I am dead, so this silo will stop! Reason: I should be Dead according to membership table (in TryUpdateMyStatusGlobalOnce): Entry = [SiloAddress=S10.0.15.87:11111:32263975 SiloName=account-service-5b8fcc48c9-cqzvk Status=Dead HostName=account-service-5b8fcc48c9-cqzvk ProxyPort=30000 RoleName= UpdateZone=0 FaultZone=0 StartTime=2023-01-09 10:12:57.437 GMT IAmAliveTime=2023-01-09 10:13:07.723 GMT Suspecters=[S10.0.9.100:11111:32264074] SuspectTimes=[2023-01-09 10:14:36.461 GMT]].
[10:40:26 ERR] Fatal error from Orleans.Runtime.MembershipService.MembershipTableManager. Context: I have been told I am dead, so this silo will stop! Reason: I should be Dead according to membership table (in TryUpdateMyStatusGlobalOnce): Entry = [SiloAddress=S10.0.15.87:11111:32263975 SiloName=account-service-5b8fcc48c9-cqzvk Status=Dead HostName=account-service-5b8fcc48c9-cqzvk ProxyPort=30000 RoleName= UpdateZone=0 FaultZone=0 StartTime=2023-01-09 10:12:57.437 GMT IAmAliveTime=2023-01-09 10:13:07.723 GMT Suspecters=[S10.0.9.100:11111:32264074] SuspectTimes=[2023-01-09 10:14:36.461 GMT]].
FATAL EXCEPTION from Orleans.Runtime.MembershipService.MembershipTableManager. Context: I have been told I am dead, so this silo will stop! Reason: I should be Dead according to membership table (in TryUpdateMyStatusGlobalOnce): Entry = [SiloAddress=S10.0.15.87:11111:32263975 SiloName=account-service-5b8fcc48c9-cqzvk Status=Dead HostName=account-service-5b8fcc48c9-cqzvk ProxyPort=30000 RoleName= UpdateZone=0 FaultZone=0 StartTime=2023-01-09 10:12:57.437 GMT IAmAliveTime=2023-01-09 10:13:07.723 GMT Suspecters=[S10.0.9.100:11111:32264074] SuspectTimes=[2023-01-09 10:14:36.461 GMT]].. Exception: null.\nCurrent stack: at System.Environment.get_StackTrace()
at Grey.MicroserviceFramework.ErrorHandler.FatalErrorHandler.OnFatalException(Object sender, String context, Exception exception) in /src/Grey.MicroserviceFramework/ErrorHandler/FatalErrorHandler.cs:line 45
at Orleans.Runtime.MembershipService.MembershipTableManager.KillMyselfLocally(String reason) in /_/src/Orleans.Runtime/MembershipService/MembershipTableManager.cs:line 618
at Orleans.Runtime.MembershipService.MembershipTableManager.TryUpdateMyStatusGlobalOnce(SiloStatus newStatus) in /_/src/Orleans.Runtime/MembershipService/MembershipTableManager.cs:line 420
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext()
at System.Threading.Tasks.TaskSchedulerAwaitTaskContinuation.<>c.<Run>b__2_0(Object state)
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
at System.Threading.Tasks.Task.ExecuteEntry()
at Orleans.Runtime.Scheduler.ActivationTaskScheduler.TryExecuteTaskInline(Task task, Boolean taskWasPreviouslyQueued) in /_/src/Orleans.Runtime/Scheduler/ActivationTaskScheduler.cs:line 117
at System.Threading.Tasks.TaskScheduler.TryRunInline(Task task, Boolean taskWasPreviouslyQueued)
at System.Threading.Tasks.TaskContinuation.InlineIfPossibleOrElseQueue(Task task, Boolean needsProtection)
at System.Threading.Tasks.TaskSchedulerAwaitTaskContinuation.Run(Task ignored, Boolean canInlineContinuationTask)
at System.Threading.Tasks.Task.RunContinuations(Object continuationObject)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetExistingTaskResult(Task`1 task, TResult result)
at Orleans.Runtime.MembershipService.AdoNetClusteringTable.ReadAll() in /_/src/AdoNet/Orleans.Clustering.AdoNet/Messaging/AdoNetClusteringTable.cs:line 83
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext()
at System.Threading.Tasks.TaskSchedulerAwaitTaskContinuation.<>c.<Run>b__2_0(Object state)
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
at System.Threading.Tasks.Task.ExecuteEntry()
at Orleans.Runtime.Scheduler.ActivationTaskScheduler.TryExecuteTaskInline(Task task, Boolean taskWasPreviouslyQueued) in /_/src/Orleans.Runtime/Scheduler/ActivationTaskScheduler.cs:line 117
at System.Threading.Tasks.TaskScheduler.TryRunInline(Task task, Boolean taskWasPreviouslyQueued)
at System.Threading.Tasks.TaskContinuation.InlineIfPossibleOrElseQueue(Task task, Boolean needsProtection)
at System.Threading.Tasks.TaskSchedulerAwaitTaskContinuation.Run(Task ignored, Boolean canInlineContinuationTask)
at System.Threading.Tasks.Task.RunContinuations(Object continuationObject)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.SetExistingTaskResult(Task`1 task, TResult result)
at Orleans.Clustering.AdoNet.Storage.RelationalOrleansQueries.ReadAsync[TResult,TAggregate](String query, Func`2 selector, Func`2 parameterProvider, Func`2 aggregator) in /_/src/AdoNet/Shared/Storage/RelationalOrleansQueries.cs:line 86
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.ExecutionContextCallback(Object s)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext(Thread threadPoolThread)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1.AsyncStateMachineBox`1.MoveNext()
at System.Threading.Tasks.TaskSchedulerAwaitTaskContinuation.<>c.<Run>b__2_0(Object state)
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
at System.Threading.Tasks.Task.ExecuteEntry()
at Orleans.Runtime.Scheduler.ActivationTaskScheduler.RunTask(Task task) in /_/src/Orleans.Runtime/Scheduler/ActivationTaskScheduler.cs:line 42
at Orleans.Runtime.Scheduler.WorkItemGroup.Execute() in /_/src/Orleans.Runtime/Scheduler/WorkItemGroup.cs:line 207
at System.Threading.ThreadPoolWorkQueue.Dispatch()
at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
FATAL EXCEPTION: BEFORE SLEEP
FATAL EXCEPTION: AFTER SLEEP
FATAL EXCEPTION: BEFORE FAIL FAST
[10:40:31 INF] Establishing connection to endpoint S10.0.9.100:11111:32264074
[10:40:31 INF] Establishing connection to endpoint S10.0.9.48:11111:32264076
[10:40:32 WRN] Connection attempt to endpoint S10.0.9.48:11111:32264076 failed
Orleans.Networking.Shared.SocketConnectionException: Unable to connect to 10.0.9.48:11111. Error: HostUnreachable
at Orleans.Networking.Shared.SocketConnectionFactory.ConnectAsync(EndPoint endpoint, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/Shared/SocketConnectionFactory.cs:line 61
at Orleans.Runtime.Messaging.ConnectionFactory.ConnectAsync(SiloAddress address, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/ConnectionFactory.cs:line 64
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
[10:40:32 WRN] Connection attempt to endpoint S10.0.9.100:11111:32264074 failed
Orleans.Networking.Shared.SocketConnectionException: Unable to connect to 10.0.9.100:11111. Error: HostUnreachable
at Orleans.Networking.Shared.SocketConnectionFactory.ConnectAsync(EndPoint endpoint, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/Shared/SocketConnectionFactory.cs:line 61
at Orleans.Runtime.Messaging.ConnectionFactory.ConnectAsync(SiloAddress address, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/ConnectionFactory.cs:line 64
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
[10:40:32 WRN] Error retrieving silo manifest for silo S10.0.9.100:11111:32264074
Orleans.Runtime.OrleansMessageRejectionException: Exception while sending message: Orleans.Runtime.Messaging.ConnectionFailedException: Unable to connect to endpoint S10.0.9.100:11111:32264074. See InnerException
---> Orleans.Networking.Shared.SocketConnectionException: Unable to connect to 10.0.9.100:11111. Error: HostUnreachable
at Orleans.Networking.Shared.SocketConnectionFactory.ConnectAsync(EndPoint endpoint, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/Shared/SocketConnectionFactory.cs:line 61
at Orleans.Runtime.Messaging.ConnectionFactory.ConnectAsync(SiloAddress address, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/ConnectionFactory.cs:line 64
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
--- End of inner exception stack trace ---
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
at Orleans.Runtime.Messaging.ConnectionManager.GetConnectionAsync(SiloAddress endpoint) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 108
at Orleans.Runtime.Messaging.MessageCenter.<SendMessage>g__SendAsync|30_0(MessageCenter messageCenter, ValueTask`1 connectionTask, Message msg) in /_/src/Orleans.Runtime/Messaging/MessageCenter.cs:line 231
at Orleans.Serialization.Invocation.ResponseCompletionSource.GetResult(Int16 token) in /_/src/Orleans.Serialization/Invocation/ResponseCompletionSource.cs:line 90
at Orleans.Runtime.OutgoingCallInvoker`1.Invoke() in /_/src/Orleans.Core/Runtime/OutgoingCallInvoker.cs:line 129
at Orleans.Runtime.ActivityPropagationGrainCallFilter.Process(IGrainCallContext context, Activity activity) in /_/src/Orleans.Core/Diagnostics/ActivityPropagationGrainCallFilter.cs:line 75
at Orleans.Runtime.OutgoingCallInvoker`1.Invoke() in /_/src/Orleans.Core/Runtime/OutgoingCallInvoker.cs:line 129
at Orleans.Runtime.GrainReferenceRuntime.InvokeMethodWithFiltersAsync[TResult](GrainReference reference, IInvokable request, InvokeMethodOptions options) in /_/src/Orleans.Core/Runtime/GrainReferenceRuntime.cs:line 76
at Orleans.Runtime.Metadata.ClusterManifestProvider.<>c__DisplayClass18_0.<<UpdateManifest>g__GetManifest|0>d.MoveNext() in /_/src/Orleans.Runtime/Manifest/ClusterManifestProvider.cs:line 163
[10:40:32 WRN] Error retrieving silo manifest for silo S10.0.9.48:11111:32264076
Orleans.Runtime.OrleansMessageRejectionException: Exception while sending message: Orleans.Runtime.Messaging.ConnectionFailedException: Unable to connect to endpoint S10.0.9.48:11111:32264076. See InnerException
---> Orleans.Networking.Shared.SocketConnectionException: Unable to connect to 10.0.9.48:11111. Error: HostUnreachable
at Orleans.Networking.Shared.SocketConnectionFactory.ConnectAsync(EndPoint endpoint, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/Shared/SocketConnectionFactory.cs:line 61
at Orleans.Runtime.Messaging.ConnectionFactory.ConnectAsync(SiloAddress address, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/ConnectionFactory.cs:line 64
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
--- End of inner exception stack trace ---
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
at Orleans.Runtime.Messaging.ConnectionManager.GetConnectionAsync(SiloAddress endpoint) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 108
at Orleans.Runtime.Messaging.MessageCenter.<SendMessage>g__SendAsync|30_0(MessageCenter messageCenter, ValueTask`1 connectionTask, Message msg) in /_/src/Orleans.Runtime/Messaging/MessageCenter.cs:line 231
at Orleans.Serialization.Invocation.ResponseCompletionSource.GetResult(Int16 token) in /_/src/Orleans.Serialization/Invocation/ResponseCompletionSource.cs:line 90
at Orleans.Runtime.OutgoingCallInvoker`1.Invoke() in /_/src/Orleans.Core/Runtime/OutgoingCallInvoker.cs:line 129
at Orleans.Runtime.ActivityPropagationGrainCallFilter.Process(IGrainCallContext context, Activity activity) in /_/src/Orleans.Core/Diagnostics/ActivityPropagationGrainCallFilter.cs:line 75
at Orleans.Runtime.OutgoingCallInvoker`1.Invoke() in /_/src/Orleans.Core/Runtime/OutgoingCallInvoker.cs:line 129
at Orleans.Runtime.GrainReferenceRuntime.InvokeMethodWithFiltersAsync[TResult](GrainReference reference, IInvokable request, InvokeMethodOptions options) in /_/src/Orleans.Core/Runtime/GrainReferenceRuntime.cs:line 76
at Orleans.Runtime.Metadata.ClusterManifestProvider.<>c__DisplayClass18_0.<<UpdateManifest>g__GetManifest|0>d.MoveNext() in /_/src/Orleans.Runtime/Manifest/ClusterManifestProvider.cs:line 163
[10:40:37 INF] Establishing connection to endpoint S10.0.9.100:11111:32264074
[10:40:37 INF] Establishing connection to endpoint S10.0.9.48:11111:32264076
[10:40:37 WRN] Connection attempt to endpoint S10.0.9.48:11111:32264076 failed
Orleans.Networking.Shared.SocketConnectionException: Unable to connect to 10.0.9.48:11111. Error: HostUnreachable
at Orleans.Networking.Shared.SocketConnectionFactory.ConnectAsync(EndPoint endpoint, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/Shared/SocketConnectionFactory.cs:line 61
at Orleans.Runtime.Messaging.ConnectionFactory.ConnectAsync(SiloAddress address, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/ConnectionFactory.cs:line 64
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
[10:40:37 WRN] Connection attempt to endpoint S10.0.9.100:11111:32264074 failed
Orleans.Networking.Shared.SocketConnectionException: Unable to connect to 10.0.9.100:11111. Error: HostUnreachable
at Orleans.Networking.Shared.SocketConnectionFactory.ConnectAsync(EndPoint endpoint, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/Shared/SocketConnectionFactory.cs:line 61
at Orleans.Runtime.Messaging.ConnectionFactory.ConnectAsync(SiloAddress address, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/ConnectionFactory.cs:line 64
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
[10:40:37 WRN] Error retrieving silo manifest for silo S10.0.9.100:11111:32264074
Orleans.Runtime.OrleansMessageRejectionException: Exception while sending message: Orleans.Runtime.Messaging.ConnectionFailedException: Unable to connect to endpoint S10.0.9.100:11111:32264074. See InnerException
---> Orleans.Networking.Shared.SocketConnectionException: Unable to connect to 10.0.9.100:11111. Error: HostUnreachable
at Orleans.Networking.Shared.SocketConnectionFactory.ConnectAsync(EndPoint endpoint, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/Shared/SocketConnectionFactory.cs:line 61
at Orleans.Runtime.Messaging.ConnectionFactory.ConnectAsync(SiloAddress address, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/ConnectionFactory.cs:line 64
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
--- End of inner exception stack trace ---
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
at Orleans.Runtime.Messaging.ConnectionManager.GetConnectionAsync(SiloAddress endpoint) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 108
at Orleans.Runtime.Messaging.MessageCenter.<SendMessage>g__SendAsync|30_0(MessageCenter messageCenter, ValueTask`1 connectionTask, Message msg) in /_/src/Orleans.Runtime/Messaging/MessageCenter.cs:line 231
at Orleans.Serialization.Invocation.ResponseCompletionSource.GetResult(Int16 token) in /_/src/Orleans.Serialization/Invocation/ResponseCompletionSource.cs:line 90
at Orleans.Runtime.OutgoingCallInvoker`1.Invoke() in /_/src/Orleans.Core/Runtime/OutgoingCallInvoker.cs:line 129
at Orleans.Runtime.ActivityPropagationGrainCallFilter.Process(IGrainCallContext context, Activity activity) in /_/src/Orleans.Core/Diagnostics/ActivityPropagationGrainCallFilter.cs:line 75
at Orleans.Runtime.OutgoingCallInvoker`1.Invoke() in /_/src/Orleans.Core/Runtime/OutgoingCallInvoker.cs:line 129
at Orleans.Runtime.GrainReferenceRuntime.InvokeMethodWithFiltersAsync[TResult](GrainReference reference, IInvokable request, InvokeMethodOptions options) in /_/src/Orleans.Core/Runtime/GrainReferenceRuntime.cs:line 76
at Orleans.Runtime.Metadata.ClusterManifestProvider.<>c__DisplayClass18_0.<<UpdateManifest>g__GetManifest|0>d.MoveNext() in /_/src/Orleans.Runtime/Manifest/ClusterManifestProvider.cs:line 163
[10:40:37 WRN] Error retrieving silo manifest for silo S10.0.9.48:11111:32264076
Orleans.Runtime.OrleansMessageRejectionException: Exception while sending message: Orleans.Runtime.Messaging.ConnectionFailedException: Unable to connect to endpoint S10.0.9.48:11111:32264076. See InnerException
---> Orleans.Networking.Shared.SocketConnectionException: Unable to connect to 10.0.9.48:11111. Error: HostUnreachable
at Orleans.Networking.Shared.SocketConnectionFactory.ConnectAsync(EndPoint endpoint, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/Shared/SocketConnectionFactory.cs:line 61
at Orleans.Runtime.Messaging.ConnectionFactory.ConnectAsync(SiloAddress address, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/ConnectionFactory.cs:line 64
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
--- End of inner exception stack trace ---
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
at Orleans.Runtime.Messaging.ConnectionManager.GetConnectionAsync(SiloAddress endpoint) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 108
at Orleans.Runtime.Messaging.MessageCenter.<SendMessage>g__SendAsync|30_0(MessageCenter messageCenter, ValueTask`1 connectionTask, Message msg) in /_/src/Orleans.Runtime/Messaging/MessageCenter.cs:line 231
at Orleans.Serialization.Invocation.ResponseCompletionSource.GetResult(Int16 token) in /_/src/Orleans.Serialization/Invocation/ResponseCompletionSource.cs:line 90
at Orleans.Runtime.OutgoingCallInvoker`1.Invoke() in /_/src/Orleans.Core/Runtime/OutgoingCallInvoker.cs:line 129
at Orleans.Runtime.ActivityPropagationGrainCallFilter.Process(IGrainCallContext context, Activity activity) in /_/src/Orleans.Core/Diagnostics/ActivityPropagationGrainCallFilter.cs:line 75
at Orleans.Runtime.OutgoingCallInvoker`1.Invoke() in /_/src/Orleans.Core/Runtime/OutgoingCallInvoker.cs:line 129
at Orleans.Runtime.GrainReferenceRuntime.InvokeMethodWithFiltersAsync[TResult](GrainReference reference, IInvokable request, InvokeMethodOptions options) in /_/src/Orleans.Core/Runtime/GrainReferenceRuntime.cs:line 76
at Orleans.Runtime.Metadata.ClusterManifestProvider.<>c__DisplayClass18_0.<<UpdateManifest>g__GetManifest|0>d.MoveNext() in /_/src/Orleans.Runtime/Manifest/ClusterManifestProvider.cs:line 163
[10:40:42 INF] Establishing connection to endpoint S10.0.9.100:11111:32264074
[10:40:42 INF] Establishing connection to endpoint S10.0.9.48:11111:32264076
[10:40:45 WRN] Connection attempt to endpoint S10.0.9.48:11111:32264076 failed
Orleans.Networking.Shared.SocketConnectionException: Unable to connect to 10.0.9.48:11111. Error: HostUnreachable
at Orleans.Networking.Shared.SocketConnectionFactory.ConnectAsync(EndPoint endpoint, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/Shared/SocketConnectionFactory.cs:line 61
at Orleans.Runtime.Messaging.ConnectionFactory.ConnectAsync(SiloAddress address, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/ConnectionFactory.cs:line 64
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
[10:40:45 WRN] Connection attempt to endpoint S10.0.9.100:11111:32264074 failed
Orleans.Networking.Shared.SocketConnectionException: Unable to connect to 10.0.9.100:11111. Error: HostUnreachable
at Orleans.Networking.Shared.SocketConnectionFactory.ConnectAsync(EndPoint endpoint, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/Shared/SocketConnectionFactory.cs:line 61
at Orleans.Runtime.Messaging.ConnectionFactory.ConnectAsync(SiloAddress address, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/ConnectionFactory.cs:line 64
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
[10:40:45 WRN] Error retrieving silo manifest for silo S10.0.9.100:11111:32264074
Orleans.Runtime.OrleansMessageRejectionException: Exception while sending message: Orleans.Runtime.Messaging.ConnectionFailedException: Unable to connect to endpoint S10.0.9.100:11111:32264074. See InnerException
---> Orleans.Networking.Shared.SocketConnectionException: Unable to connect to 10.0.9.100:11111. Error: HostUnreachable
at Orleans.Networking.Shared.SocketConnectionFactory.ConnectAsync(EndPoint endpoint, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/Shared/SocketConnectionFactory.cs:line 61
at Orleans.Runtime.Messaging.ConnectionFactory.ConnectAsync(SiloAddress address, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/ConnectionFactory.cs:line 64
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
--- End of inner exception stack trace ---
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
at Orleans.Runtime.Messaging.ConnectionManager.GetConnectionAsync(SiloAddress endpoint) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 108
at Orleans.Runtime.Messaging.MessageCenter.<SendMessage>g__SendAsync|30_0(MessageCenter messageCenter, ValueTask`1 connectionTask, Message msg) in /_/src/Orleans.Runtime/Messaging/MessageCenter.cs:line 231
at Orleans.Serialization.Invocation.ResponseCompletionSource.GetResult(Int16 token) in /_/src/Orleans.Serialization/Invocation/ResponseCompletionSource.cs:line 90
at Orleans.Runtime.OutgoingCallInvoker`1.Invoke() in /_/src/Orleans.Core/Runtime/OutgoingCallInvoker.cs:line 129
at Orleans.Runtime.ActivityPropagationGrainCallFilter.Process(IGrainCallContext context, Activity activity) in /_/src/Orleans.Core/Diagnostics/ActivityPropagationGrainCallFilter.cs:line 75
at Orleans.Runtime.OutgoingCallInvoker`1.Invoke() in /_/src/Orleans.Core/Runtime/OutgoingCallInvoker.cs:line 129
at Orleans.Runtime.GrainReferenceRuntime.InvokeMethodWithFiltersAsync[TResult](GrainReference reference, IInvokable request, InvokeMethodOptions options) in /_/src/Orleans.Core/Runtime/GrainReferenceRuntime.cs:line 76
at Orleans.Runtime.Metadata.ClusterManifestProvider.<>c__DisplayClass18_0.<<UpdateManifest>g__GetManifest|0>d.MoveNext() in /_/src/Orleans.Runtime/Manifest/ClusterManifestProvider.cs:line 163
[10:40:45 WRN] Error retrieving silo manifest for silo S10.0.9.48:11111:32264076
Orleans.Runtime.OrleansMessageRejectionException: Exception while sending message: Orleans.Runtime.Messaging.ConnectionFailedException: Unable to connect to endpoint S10.0.9.48:11111:32264076. See InnerException
---> Orleans.Networking.Shared.SocketConnectionException: Unable to connect to 10.0.9.48:11111. Error: HostUnreachable
at Orleans.Networking.Shared.SocketConnectionFactory.ConnectAsync(EndPoint endpoint, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/Shared/SocketConnectionFactory.cs:line 61
at Orleans.Runtime.Messaging.ConnectionFactory.ConnectAsync(SiloAddress address, CancellationToken cancellationToken) in /_/src/Orleans.Core/Networking/ConnectionFactory.cs:line 64
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
--- End of inner exception stack trace ---
at Orleans.Runtime.Messaging.ConnectionManager.ConnectAsync(SiloAddress address, ConnectionEntry entry) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 228
at Orleans.Runtime.Messaging.ConnectionManager.GetConnectionAsync(SiloAddress endpoint) in /_/src/Orleans.Core/Networking/ConnectionManager.cs:line 108
at Orleans.Runtime.Messaging.MessageCenter.<SendMessage>g__SendAsync|30_0(MessageCenter messageCenter, ValueTask`1 connectionTask, Message msg) in /_/src/Orleans.Runtime/Messaging/MessageCenter.cs:line 231
at Orleans.Serialization.Invocation.ResponseCompletionSource.GetResult(Int16 token) in /_/src/Orleans.Serialization/Invocation/ResponseCompletionSource.cs:line 90
at Orleans.Runtime.OutgoingCallInvoker`1.Invoke() in /_/src/Orleans.Core/Runtime/OutgoingCallInvoker.cs:line 129
at Orleans.Runtime.ActivityPropagationGrainCallFilter.Process(IGrainCallContext context, Activity activity) in /_/src/Orleans.Core/Diagnostics/ActivityPropagationGrainCallFilter.cs:line 75
at Orleans.Runtime.OutgoingCallInvoker`1.Invoke() in /_/src/Orleans.Core/Runtime/OutgoingCallInvoker.cs:line 129
at Orleans.Runtime.GrainReferenceRuntime.InvokeMethodWithFiltersAsync[TResult](GrainReference reference, IInvokable request, InvokeMethodOptions options) in /_/src/Orleans.Core/Runtime/GrainReferenceRuntime.cs:line 76
at Orleans.Runtime.Metadata.ClusterManifestProvider.<>c__DisplayClass18_0.<<UpdateManifest>g__GetManifest|0>d.MoveNext() in /_/src/Orleans.Runtime/Manifest/ClusterManifestProvider.cs:line 163
[10:40:50 INF] Establishing connection to endpoint S10.0.9.100:11111:32264074
[10:40:50 INF] Establishing connection to endpoint S10.0.9.48:11111:32264076
root@account-service-5b8fcc48c9-cqzvk:/tmp# dotnet --info
Host: Version: 7.0.1 Architecture: arm64 Commit: 97203d38ba
.NET SDKs installed: No SDKs were found.
.NET runtimes installed: Microsoft.AspNetCore.App 7.0.1 [/usr/share/dotnet/shared/Microsoft.AspNetCore.App] Microsoft.NETCore.App 7.0.1 [/usr/share/dotnet/shared/Microsoft.NETCore.App]
Other architectures found: None
Environment variables: Not set
global.json file: Not found
Learn more: https://aka.ms/dotnet/info
Download .NET: https://aka.ms/dotnet/download
root@account-service-5b8fcc48c9-cqzvk:/tmp# dotnet tool install --global dotnet-dump The command could not be loaded, possibly because:
Download a .NET SDK: https://aka.ms/dotnet/download
Learn about SDK resolution: https://aka.ms/dotnet/sdk-not-found
Unfortunately, It seemed dotnet-dump requires SDK.
4. before rebuild the image with SDK, I wrote a simple `Environment.FailFast` with 1 foreground thread program to see If it is something wrong with the runtime image:
root@account-service-5b8fcc48c9-cqzvk:/tmp# ls -al
total 328
drwxrwxrwt 1 root root 161 Jan 9 10:23 .
drwxr-xr-x 1 root root 39 Jan 9 10:12 ..
-rwxrwxrwx 1 root root 151064 Jan 9 10:23 ConsoleApp3
-rwxrwxrwx 1 root root 403 Jan 9 10:23 ConsoleApp3.deps.json
-rwxrwxrwx 1 root root 5120 Jan 9 10:23 ConsoleApp3.dll
-rwxrwxrwx 1 root root 153600 Jan 9 10:23 ConsoleApp3.exe
-rwxrwxrwx 1 root root 10552 Jan 9 10:23 ConsoleApp3.pdb
-rwxrwxrwx 1 root root 139 Jan 9 10:23 ConsoleApp3.runtimeconfig.json
root@account-service-5b8fcc48c9-cqzvk:/tmp# ./ConsoleApp3.exe
bash: ./ConsoleApp3.exe: cannot execute binary file: Exec format error
root@account-service-5b8fcc48c9-cqzvk:/tmp# dotnet ConsoleApp3.dll
Hello, World!
Process terminated. hello?
at System.Environment.FailFast(System.String)
at Program.
It crashed.
5. unfortunately dotnet-dump is not working even with SDKS so I stopped investigate here.
Interesting. Thanks for investigating. I wonder if injecting your own IHostApplicationLifecycle
into IFatalExceptionHandler
and terminating the application that way works.
What base image/distro are you using? Is your process running under a debugger?
In the past, when we've needed to diagnose issues with processes running in containers using diagnostics tools, we've installed the SDK into the container on the fly, in the base image, or configured a dotnet-monitor
sidecar container.
What base image/distro are you using?
mcr.microsoft.com/dotnet/aspnet:7.0-jammy-arm64v8 (ubuntu) and Amazon Linux
/app# uname -a
Linux account-service-569c9ccb97-9qkgh 5.4.226-129.415.amzn2.aarch64 #1 SMP Fri Dec 9 12:54:10 UTC 2022 aarch64 aarch64 aarch64 GNU/Linux
Is your process running under a debugger?
nope
In the past, when we've needed to diagnose issues with processes running in containers using diagnostics tools, we've installed the SDK into the container on the fly, in the base image, or configured a dotnet-monitor sidecar container.
Thanks for the advice, I've tried it too but dotnet-dump ps
doesn't detect any dotnet processes in the environment even with root priviledge. weird.
Hello, I've upgraded to orleans 7 and experiencing some odd situations and one thing is LocalSiloHealthMonitor does not kill the silo.
I was narrowing down why the node is failed to respond to the probe after the upgrade but this is worse since I can't expect to restart the node automatically and the service just stops until I manually restart the node.. :|
Note the time, the log is printed on
2022-11-24T04:59:59.6208442
and last probe was2022-11-23T00:06:06.6389545Z
node hasn't been killed for almost 28hrs and just logging same thing over and overthis is OrleansMembershipTable from SQLServer
From the doc:
https://learn.microsoft.com/en-us/dotnet/orleans/deployment/kubernetes
Is there something I'm missing to terminate the node and restart in orleans 7?