StackExchange / StackExchange.Redis

General purpose redis client
https://stackexchange.github.io/StackExchange.Redis/
Other
5.87k stars 1.51k forks source link

ConnectTimeout errors, threads blocked waiting for connection, app unresponsive #2534

Closed razzemans closed 1 year ago

razzemans commented 1 year ago

Using StackExchange.Redis 2.6.96.

We have an issue where one of our application seems unable to handle any requests after an app service restart in Azure due to Redis connection issues. To be precise, we have the application hosted in West Europe and North Europe and usually one of the two will have the issue when there is consistent load on the servers (this load is far, far from any peak load we experience and we have other applications with similar setups that do not see this issue). I managed to create a memory dump during the time it fails to start. This is what I observe:

image

The threads are seemingly blocked since they wait on another thread, if I look at the blocking thread it says it is blocked on this line:

image

There are also some exceptions on the stack, one is the Redis ConnectTimeout.

So I understand what is happening... yet not sure how to fix this. What I understand is that the call to .Result is blocking which normally should not be a problem as it will usually connect in under a couple of seconds. In this case, the connection fails so it will block for about 5000ms (the default setting since we have not changed it). Not exactly sure how the retry mechanism than works - if it just retries 3 times as stated as that being the default.

Is this assumption correct? Any reason why this would happen? Any direction on how to debug and fix this is appreciated.

NickCraver commented 1 year ago

It looks like you're hitting thread exhaustion on that Lazy lock, which likely means we don't have any threads to handle the completion of the connect, contributing to a bit of spiral of death here.

My recommendation would be: either spin up this connection earlier before you're under load (e.g. during startup, before requests are allowed in to pile up on it), make the usage totally async, or you'd need to increase your MinThreads during startup to cope.

razzemans commented 1 year ago

Thanks for the quick reply. All good suggestions. I will up the number of MinThreads - actually, this application with the issue is the only one where it is not set to a custom (higher) value. Will also look into the other suggestions.