Open JLRishe opened 5 months ago
What is the server endpoint here, and where is it in relation to the server? I'm trying to get a feel for payload sizes and latency involved here.
For what it's worth, the default max timeout against Azure instances is 5000ms, but that's based on people often connecting cross-region and such unaware of the consequences, so let's see what's happening in your specific scenario.
@NickCraver Thank you very much for your reply.
I'm not sure what you mean by "what is the server endpoint", but both the website and the Redis cache are in the same Azure region (West US). To reiterate, the website is an Azure App Service and the Redis cache is an Azure Cache for Redis instance, currently P1 service level.
This looks like asp.net (meaning: not "current .net") code, and it is a known reality that "old" asp.net is particularly glitchy with "sync over async" scenarios (in the context of pool thread exhaustion). This looks to be synchronous code in the provider. My recommendations would be:
I understand that "2" is complex in some scenarios, but in terms of impact: there's a huge difference between "current" .net and "old" asp.net; it may be worth consideration.
(I'm not sure whether here or https://github.com/Azure/aspnet-redis-providers is the right place to post these inquiries. I posted an issue there yesterday but haven't received a response so far, so I'm cross-posting here).
I am using the Redis session state provider with an ASP.NET Azure App Service. I'm using Azure Cache for Redis for the cache.
Last week, I had a look at my app's logs, and noticed that I was getting several dozen Redis timeout errors per day. I have had issues in the past where Redis timeouts brought my app to a complete standstill, but that doesn't seem to be happening now. The timeouts occur sporadically but don't cause a critical disruption.
Service levels:
Today, I tried increasing the operation timeout value from 1000ms to 2000ms on Monday and then to 3000ms today, but I am continuing to see about a dozen timeouts per hour.
Below is the error message and stack trace from one of these timeouts. What can I do to figure why these are happening and how to remedy them?