Frequent timeout errors with session state provider in Azure

JLRishe commented 5 months ago

(I'm not sure whether here or https://github.com/Azure/aspnet-redis-providers is the right place to post these inquiries. I posted an issue there yesterday but haven't received a response so far, so I'm cross-posting here).

I am using the Redis session state provider with an ASP.NET Azure App Service. I'm using Azure Cache for Redis for the cache.

Last week, I had a look at my app's logs, and noticed that I was getting several dozen Redis timeout errors per day. I have had issues in the past where Redis timeouts brought my app to a complete standstill, but that doesn't seem to be happening now. The timeouts occur sporadically but don't cause a critical disruption.

Service levels:

App Service: Standard S1
Redis: P1. I was using C2 until Friday, but the upgrade doesn't seem to have made a difference.

Today, I tried increasing the operation timeout value from 1000ms to 2000ms on Monday and then to 3000ms today, but I am continuing to see about a dozen timeouts per hour.

Below is the error message and stack trace from one of these timeouts. What can I do to figure why these are happening and how to remedy them?

StackExchange.Redis.RedisTimeoutException: Timeout performing EVAL (3000ms), next: EVAL, inst: 1, qu: 0, qs: 0, aw: False, bw: SpinningDown, rs: ReadAsync, ws: Idle, in: 0, last-in: 0, cur-in: 0, sync-ops: 10498, async-ops: 1, serverEndpoint: ************:6380, conn-sec: 2360.93, mc: 1/1/0, mgr: 10 of 10 available, clientName: ***********(SE.Redis-v2.6.90.64945), IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=4,Free=8187,Min=1,Max=8191), v: 2.6.90.64945 (Please take a look at this article for some common client-side issues that can cause timeouts: https://stackexchange.github.io/StackExchange.Redis/Timeouts)
    Stack Trace:    at StackExchange.Redis.ConnectionMultiplexer.ExecuteSyncImpl[T](Message message, ResultProcessor`1 processor, ServerEndPoint server, T defaultValue) in /_/src/StackExchange.Redis/ConnectionMultiplexer.cs:line 1980
   at StackExchange.Redis.RedisBase.ExecuteSync[T](Message message, ResultProcessor`1 processor, ServerEndPoint server, T defaultValue) in /_/src/StackExchange.Redis/RedisBase.cs:line 62
   at StackExchange.Redis.RedisDatabase.ScriptEvaluate(String script, RedisKey[] keys, RedisValue[] values, CommandFlags flags) in /_/src/StackExchange.Redis/RedisDatabase.cs:line 1520
   at Microsoft.Web.Redis.StackExchangeClientConnection.<>c__DisplayClass7_0.<Eval>b__0() in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\Shared\StackExchangeClientConnection.cs:line 68
   at Microsoft.Web.Redis.StackExchangeClientConnection.OperationExecutor(Func`1 redisOperation) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\Shared\StackExchangeClientConnection.cs:line 95
   at Microsoft.Web.Redis.StackExchangeClientConnection.RetryLogic(Func`1 redisOperation) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\Shared\StackExchangeClientConnection.cs:line 118
   at Microsoft.Web.Redis.StackExchangeClientConnection.Eval(String script, String[] keyArgs, Object[] valueArgs) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\Shared\StackExchangeClientConnection.cs:line 68
   at Microsoft.Web.Redis.RedisConnectionWrapper.TryTakeWriteLockAndGetData(DateTime lockTime, Int32 lockTimeout, Object& lockId, ISessionStateItemCollection& data, Int32& sessionTimeout) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\RedisSessionStateProvider\RedisConnectionWrapper.cs:line 184
   at Microsoft.Web.Redis.RedisSessionStateProvider.GetItemFromSessionStore(Boolean isWriteLockRequired, HttpContextBase context, String id, CancellationToken cancellationToken, Boolean& locked, TimeSpan& lockAge, Object& lockId, SessionStateActions& actions) in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\RedisSessionStateProvider\RedisSessionStateProvider.cs:line 291
   at Microsoft.Web.Redis.RedisSessionStateProvider.<GetItemExclusiveAsync>d__20.MoveNext() in C:\TeamCity\buildAgent\work\59b31e8e4035fd30\src\RedisSessionStateProvider\RedisSessionStateProvider.cs:line 206
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.AspNet.SessionState.SessionStateModuleAsync.<GetSessionStateItemAsync>d__74.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.AspNet.SessionState.SessionStateModuleAsync.<AcquireStateAsync>d__65.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at Microsoft.AspNet.SessionState.TaskAsyncHelper.EndTask(IAsyncResult ar)
   at Microsoft.AspNet.SessionState.SessionStateModuleAsync.EndAcquireState(IAsyncResult result)
   at System.Web.HttpApplication.AsyncEventExecutionStep.InvokeEndHandler(IAsyncResult ar)
   at System.Web.HttpApplication.AsyncEventExecutionStep.OnAsyncEventCompletion(IAsyncResult ar)

NickCraver commented 4 months ago

What is the server endpoint here, and where is it in relation to the server? I'm trying to get a feel for payload sizes and latency involved here.

For what it's worth, the default max timeout against Azure instances is 5000ms, but that's based on people often connecting cross-region and such unaware of the consequences, so let's see what's happening in your specific scenario.

JLRishe commented 4 months ago

@NickCraver Thank you very much for your reply.

I'm not sure what you mean by "what is the server endpoint", but both the website and the Redis cache are in the same Azure region (West US). To reiterate, the website is an Azure App Service and the Redis cache is an Azure Cache for Redis instance, currently P1 service level.

mgravell commented 4 months ago

This looks like asp.net (meaning: not "current .net") code, and it is a known reality that "old" asp.net is particularly glitchy with "sync over async" scenarios (in the context of pool thread exhaustion). This looks to be synchronous code in the provider. My recommendations would be:

log an issue with the provider, https://github.com/Azure/aspnet-redis-providers, for true async
consider migration to "current" .net

I understand that "2" is complex in some scenarios, but in terms of impact: there's a huge difference between "current" .net and "old" asp.net; it may be worth consideration.

StackExchange / StackExchange.Redis

Frequent timeout errors with session state provider in Azure #2640