Closed terryhuangbo closed 1 year ago
I'm also seeing this. Any solution?
Hi
I can’t reproduce the issue! After seing the callstack this is a error on the retry in the servicebus connection.. ie, the check is working and notify a issue after 5 seconds pf retries!
We don't see any of these errors under normal use. Only during healthcheck endpoints. We're at the point we're it's happening so often we're going to disable the specific health check.
We're seeing this issue as well since we run regular live and health checks on the ServiceBus using AddAzureServiceBusQueue() to register the actual check. If you need more information please let me know.
When I run the following code from a local console app, no exception is raised. The code more or less corresponds to the one in the AzureServiceBusQueueHealthCheck:
static async System.Threading.Tasks.Task Main(string[] args) { try { while (true) { ServiceBusAdministrationClient managementClient = new ServiceBusAdministrationClient(MYENDPOINT); var x = await managementClient.GetQueueRuntimePropertiesAsync(MYQUEUE, CancellationToken.None); Thread.Sleep(5000); } } catch (Exception ex) { throw ex; } }
I took this code as a custom implementation, added it to my .net core application and deployed it to k8s where there is a liveness probe with a period of 30s set. It raised the exceptions like @terryhuangbo described.
The infrastructure being checked was the same, i.e. the ConnectionString pointed to the equal service bus queue located on the cloud.
@unaizorrilla: Could it be there is a CancellationToken passed / present on the live system / .net core when performing the check that may cause the issue somehow? We're stuck... There is no other way for us to disable the liveness probe as the pods get into a recreation-loop with the probe keeps failing.
When I run the following code from a local console app, no exception is raised. The code more or less corresponds to the one in the AzureServiceBusQueueHealthCheck:
static async System.Threading.Tasks.Task Main(string[] args) { try { while (true) { ServiceBusAdministrationClient managementClient = new ServiceBusAdministrationClient(MYENDPOINT); var x = await managementClient.GetQueueRuntimePropertiesAsync(MYQUEUE, CancellationToken.None); Thread.Sleep(5000); } } catch (Exception ex) { throw ex; } }
I took this code as a custom implementation, added it to my .net core application and deployed it to k8s where there is a liveness probe with a period of 30s set. It raised the exceptions like @terryhuangbo described.
The infrastructure being checked was the same, i.e. the ConnectionString pointed to the equal service bus queue located on the cloud.
@unaizorrilla: Could it be there is a CancellationToken passed / present on the live system / .net core when performing the check that may cause the issue somehow? We're stuck... There is no other way for us to disable the liveness probe as the pods get into a recreation-loop with the probe keeps failing.
It turned out increasing the liveness check's timeout (1sec -> 20sec) makes it better, but the error still shows up from time to time...
while (true) { ServiceBusAdministrationClient managementClient = new ServiceBusAdministrationClient(MYCON); Stopwatch stopwatch = new Stopwatch(); stopwatch.Start(); var x = await managementClient.GetQueueRuntimePropertiesAsync(MYQUEUE, CancellationToken.None); stopwatch.Stop(); Console.WriteLine("Elapsed Time is {0} ms", stopwatch.ElapsedMilliseconds); Thread.Sleep(1000); }
Executed locally with using infra from cloud. Results are quite amazing...:
Elapsed Time is 90 ms Elapsed Time is 54 ms Elapsed Time is 39 ms Elapsed Time is 147 ms Elapsed Time is 289 ms Elapsed Time is 46 ms Elapsed Time is 541 ms Elapsed Time is 42 ms Elapsed Time is 234 ms Elapsed Time is 1059 ms Elapsed Time is 47 ms Elapsed Time is 215 ms Elapsed Time is 1514 ms Elapsed Time is 5314 ms Elapsed Time is 1442 ms
We've had to remove this health check. Will watch this thread for updates. Even increasing the timeout didn't help.
Is that issue still a thing @thisispaulsmith @Meandron ?
Yes, it is still there. We worked around it by creating our own HealthCheck class which does pretty much the same but only logs warnings as otherwise our logs get spammed with "false" error logs.
We receive those kind of error log also for other health check types, like BlogStorage:
(Notice the completion time)
I'm experiencing the same thing... It almost seems like we are getting throttled form the management client to hook into Azure with multiple replica's at the same servicebus.
By default you have 1000 credits, every management call (what this healthcheck is doing) costs 10 credits. If you have a bunch of replicas, it can count up to throtteling. See: https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-throttling#what-are-the-credit-limits
When using the premium service bus, no issues...
I've the same issue with the NpgSqlHealthCheck:
Health check Database with status Unhealthy completed after 18.4543ms with message '(null)'
System.Threading.Tasks.TaskCanceledException:
at Npgsql.TaskExtensions+<WithCancellation>d__1`1.MoveNext (Npgsql, Version=6.0.6.0, Culture=neutral, PublicKeyToken=5d8b90d52f46fda7)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Npgsql.TaskExtensions+<WithTimeout>d__0`1.MoveNext (Npgsql, Version=6.0.6.0, Culture=neutral, PublicKeyToken=5d8b90d52f46fda7)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Npgsql.Internal.NpgsqlConnector+<ConnectAsync>d__198.MoveNext (Npgsql, Version=6.0.6.0, Culture=neutral, PublicKeyToken=5d8b90d52f46fda7)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Npgsql.Internal.NpgsqlConnector+<RawOpen>d__196.MoveNext (Npgsql, Version=6.0.6.0, Culture=neutral, PublicKeyToken=5d8b90d52f46fda7)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Npgsql.Internal.NpgsqlConnector+<<Open>g__OpenCore|191_1>d.MoveNext (Npgsql, Version=6.0.6.0, Culture=neutral, PublicKeyToken=5d8b90d52f46fda7)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Npgsql.Internal.NpgsqlConnector+<Open>d__191.MoveNext (Npgsql, Version=6.0.6.0, Culture=neutral, PublicKeyToken=5d8b90d52f46fda7)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Npgsql.UnpooledConnectorSource+<Get>d__6.MoveNext (Npgsql, Version=6.0.6.0, Culture=neutral, PublicKeyToken=5d8b90d52f46fda7)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at Npgsql.NpgsqlConnection+<<Open>g__OpenAsync|45_0>d.MoveNext (Npgsql, Version=6.0.6.0, Culture=neutral, PublicKeyToken=5d8b90d52f46fda7)
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification (System.Private.CoreLib, Version=6.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e)
at HealthChecks.NpgSql.NpgSqlHealthCheck+<CheckHealthAsync>d__4.MoveNext (HealthChecks.NpgSql, Version=6.0.2.0, Culture=neutral, PublicKeyToken=null: /_/src/HealthChecks.NpgSql/NpgSqlHealthCheck.cs:42)
As @ferryferry said the core issue is (likely) rate limiting on server side so nothing to do on the caller side - you just can choose appropriate intervals for checks or use premium service bus.
@ddegasperi Let's track npgsql in another issue.
@ddegasperi Let's track npgsql in another issue.
@sungam3r ok, I've created a new issue #1611
@sungam3r I have the same issue with the healthcheck and checked logs for throttled requests and there are non of them. Also the limit is 1000 requests per second (where management requests count as 10 and regular reads and writes as 1), so I highly doubt that issue is related to throttling
Anyone was ever able to find a solution for this?
I'm having this issue with Npgsql as well. .NET 8.0.2.
Same issue with version 8.0.1 for Azure Service Bus, Npgsql as well as Redis (timeout)
Please, fill the following sections to help us fix the issue
What happened: When I use AspNetCore.HealthChecks.AzureServiceBus-5.1.1 and AspNetCore.HealthChecks.AzureStorage-5.0.1 AspNetCore.HealthChecks.AzureKeyVault-5.0.1 for health checking, requests to healthy check endpoints "/health/live" and "/health/ready" return healthy results, but too frequent requests can cause unhealthy error logs irregularly. Like bellowing:
Health check AzureServiceBus-High completed after 5001.3016ms with status Unhealthy and description '(null)' System.Threading.Tasks.TaskCanceledException: The operation was canceled. at Azure.Core.CancellationHelper.ThrowOperationCanceledException(Exception innerException, CancellationToken cancellationToken) at Azure.Core.CancellationHelper.ThrowIfCancellationRequested(CancellationToken cancellationToken) at Azure.Core.Pipeline.ResponseBodyPolicy.ThrowIfCancellationRequestedOrTimeout(CancellationToken originalToken, CancellationToken timeoutToken, Exception inner, TimeSpan timeout) at Azure.Core.Pipeline.ResponseBodyPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory
1 pipeline, Boolean async) at Azure.Core.Pipeline.RedirectPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory1 pipeline, Boolean async) at Azure.Core.Pipeline.RetryPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory
1 pipeline, Boolean async) at Azure.Core.Pipeline.RetryPolicy.ProcessAsync(HttpMessage message, ReadOnlyMemory1 pipeline, Boolean async) at Azure.Core.Pipeline.HttpPipeline.SendRequestAsync(Request request, CancellationToken cancellationToken) at Azure.Messaging.ServiceBus.Administration.HttpRequestAndResponse.SendHttpRequestAsync(Request request, CancellationToken cancellationToken) at Azure.Messaging.ServiceBus.Administration.HttpRequestAndResponse.GetEntityAsync(String entityPath, String query, Boolean enrich, CancellationToken cancellationToken) at Azure.Messaging.ServiceBus.Administration.ServiceBusAdministrationClient.GetQueueRuntimePropertiesAsync(String name, CancellationToken cancellationToken) at HealthChecks.AzureServiceBus.AzureServiceBusQueueHealthCheck.CheckHealthAsync(HealthCheckContext context, CancellationToken cancellationToken)
What you expected to happen: No error logs happen.
How to reproduce it (as minimally and precisely as possible): As described ahead. Source code sample:
services.AddHealthChecks() .AddAzureBlobStorage(configuration["EmailTemplateBlobConnectionStr"], name: "AzureStorage", failureStatus: HealthStatus.Unhealthy, tags: new[] { HealthCheckConstants.ReadinessTag }) .AddAzureServiceBusQueue( name: "AzureServiceBus-High", connectionString: configuration["EmailDeliveryServiceBusConnection"], queueName: "emaildelivery-high", failureStatus: HealthStatus.Unhealthy, tags: new[] { HealthCheckConstants.ReadinessTag }) .AddAzureServiceBusQueue( name: "AzureServiceBus-Normal", connectionString: configuration["EmailDeliveryServiceBusConnection"], queueName: "emaildelivery-normal", failureStatus: HealthStatus.Unhealthy, tags: new[] { HealthCheckConstants.ReadinessTag }) .AddAzureServiceBusQueue( name: "AzureServiceBus-Low", connectionString: configuration["EmailDeliveryServiceBusConnection"], queueName: "emaildelivery-low", failureStatus: HealthStatus.Unhealthy, tags: new[] { HealthCheckConstants.ReadinessTag }) .AddAzureKeyVault( name: "AzureKeyVault", keyVaultServiceUri: new Uri($"https://{configuration["KeyVaultName"]}.vault.azure.net/"), credential: new DefaultAzureCredential(), setup: options => { options.AddSecret(configuration["KeyVaultSecrectName"]); }, failureStatus: HealthStatus.Unhealthy, tags: new[] { HealthCheckConstants.ReadinessTag });
Anything else we need to know?: Nothing Environment: