MichaCo / CacheManager

CacheManager is an open source caching abstraction layer for .NET written in C#. It supports various cache providers and implements many advanced features.
http://cachemanager.michaco.net
Apache License 2.0
2.33k stars 458 forks source link

RetryHelper causing deadlock #300

Open pergardebrink opened 4 years ago

pergardebrink commented 4 years ago

Hi,

We had issues with a web site of ours that freezes from time to time. Looking at the process dumps, there's thousands of threads (initiated by web requests) waiting for a lock in the Eval method in RedisCacheHandle.cs, which is on this line as far as I can see: https://github.com/MichaCo/CacheManager/blob/4385c06a409e9736a7531c5e11003032b8a4c151/src/CacheManager.StackExchange.Redis/RedisCacheHandle.cs#L977

It seems as the reason for that is that one single thread is stuck in RetryHelper.cs, waiting forever for a Task.Delay on this line, that I guess was called because there was some temporary Redis server issue: https://github.com/MichaCo/CacheManager/blob/3f3ad356dce3f7a6a15e7caa0df68ea087beb59b/src/CacheManager.StackExchange.Redis/RetryHelper.cs#L43

Doing .Wait() on a awaitable could AFAIK cause deadlocks when in ASP.NET request context because the synchronization context cannot be restored when the continuation is run: https://blog.stephencleary.com/2012/07/dont-block-on-async-code.html

I guess a workaround would be to add ConfigureAwait to avoid deadlock even though it's sort of a hack: Task.Delay(timeOut).ConfigureAwait(false).GetAwaiter().GetResult();

Or just use Thread.Sleep(timeOut) in this case, or am I missing something?

MichaCo commented 4 years ago

or am I missing something?

No I think you are right, that .Wait could cause issues here. I haven't been running into any issues under ASP.NET Core though, do you use classic ASP.NET?

To work around for now, could you try to set the retry trimeout to 0. This should always cause the Task.Delay to run synchronous.

Or, set the retry count to zero, but that could cause a lot of unexpected exceptions because the client tends to throw timeouts randomly in some situations.

pergardebrink commented 4 years ago

Yes, we're currently on classic ASP.NET (in hopefully a not to distant future on ASP.NET Core though) but I guess it could happen in ASP.NET Core as well, but maybe it's not as likely somehow?

I thought about those workarounds you suggested also, for avoiding the Task.Delay Wait to deadlock. I guess that having a few more retries than we have currently (to account for retries being more frequent) and lowering the timeout to 0 could at least do something for the better.

Is it a tedious process to release a 1.2.1 hotfix or similar with just a fix for this? Can I help out with something in that case?