Closed Magnus12 closed 4 years ago
I'm curious about this too. It seems like this would seriously impact the performance under heavy loads.
There's some interested approaches here to lock on keys: https://github.com/aspnet/Extensions/issues/708
You will definitely want to lock per key using buckets - with more and more cores available on today's CPU and especially with heavy requests that take hundreds of milliseconds or more, a busy web server will spend quite a lot of time dealing with singleton locks where tons of queries will stack on the lock, causing lots of kernel transitions.
C# lock is pretty optimized - it will spin lock first for a few milliseconds to avoid the kernel transition, but the whole point of caching is because the queries are expensive, and in this case the lock is almost always going to do a kernel transition which still ends up being cheaper than a spin lock for a lock held longer than a few dozen milliseconds.
This solution in my tests has no noticeable performance impact for single core execution, and the performance gains will increase as more cores and requests per second are added.
public class CachingService
{
private readonly int[] keyLocks = new int[8192];
public virtual async Task<T> GetOrAdd<T>(string key, Func<Task<T>> addItemFactory)
{
uint hash = (uint)key.GetHashCode() % (uint)keyLocks.Length;
while (Interlocked.CompareExchange(ref keyLocks[hash], 1, 0) == 1) { Thread.Yield(); }
try
{
return await addItemFactory();
}
finally
{
keyLocks[hash] = 0;
}
}
}
The current code does lock on the insertion/retrieval of the from the cache - however because we are using Lazy that is not the same as locking on the entire time the cache item factory is running and building your object.
So we only lock on the time to get/add an in-memory object to an in memory data structure. The Lazy then locks on the actual factory and this is per item not global. Although not ideal, in my experience this has always been good enough - if I am bothering to cache something it usually takes 10's or 100's of milliseconds to actually generate the cache item, and so the small price to get the Lazy out of the dictionary is relatively pretty small.
Locking by key, or by key range in the example looks interesting but I think we should prove it is better (faster and without substantial memory increase) using benchmark-dot-net before taking on the additional complexity.
Updated my example with Interlocked.CompareExchange
. No noticeable difference in performance tests versus not locking at all.
In the two places where it's locking on the SemaphoreSlim, is it doing work that would ever be executed any way except synchronously? If not, could it just use a Monitor on the key (or possibly the key + a private guid string)?
I was facing a serious contention problem under moderate load. I've removed the original locking code and implemented the lock by key as suggested by @jjxtra (slightly modified). Since then I've noticed a nice performance improvement and no more contention.
int hash = GetLockIndex(key);
while (Interlocked.CompareExchange(ref keyLocks[hash], 1, 0) == 1) { Thread.Yield(); }
private int GetLockIndex(string key)
{
// Since GetHashCode can return a negative number we slide it to the positive range
return Convert.ToInt32((Convert.ToInt64(key.GetHashCode()) + Int32.MaxValue) % keyLocks.Length);
}
The more cores and more requests per second the bigger the win you will get by locking per key bucket, especially if you have heavy queries that take more than a few hundred milliseconds.
Is there any reason for not replacing the current lock implementation? Any downside?
I’ve already proved the lock per key is much faster under heavy load and higher core count when I assisted with the Polly duplicate request collapser policy. This is a no brainer and only costs like 8k of ram.
Perhaps someone would be willing to submit a PR for this?
PR up here: https://github.com/alastairtree/LazyCache/pull/126
For more robust usage, Polly provides additional functionality like cache, retry, circuit breaker, duplicate request collapser, etc.
https://github.com/App-vNext/Polly https://github.com/Polly-Contrib/Polly.Contrib.DuplicateRequestCollapser
Yeah Polly is great, especially when you need more complicated behaviours.
Thanks! Will be released in LazyCache 2.1.1
I see that in
GetOrAddAsync
you call awaitthis.locker.WaitAsync().ConfigureAwait(false);
with a global SemaphoreSlim. Wont this block for all calls to the cache while it should just block for the specified key? If I for example have the scenario where I need to fill the cache with some heavy DB call for two different entries and these are called concurrently, wont this block the 2nd one untill the first one is completed? Should the Semaphore not be in a concurrent dictionary where the key is the cache key. (one Semaphore per entry)