SemaphoreSlim block all calls

Magnus12 commented 4 years ago

I see that in GetOrAddAsync you call await this.locker.WaitAsync().ConfigureAwait(false); with a global SemaphoreSlim. Wont this block for all calls to the cache while it should just block for the specified key? If I for example have the scenario where I need to fill the cache with some heavy DB call for two different entries and these are called concurrently, wont this block the 2nd one untill the first one is completed? Should the Semaphore not be in a concurrent dictionary where the key is the cache key. (one Semaphore per entry)

jgimness commented 4 years ago

I'm curious about this too. It seems like this would seriously impact the performance under heavy loads.

There's some interested approaches here to lock on keys: https://github.com/aspnet/Extensions/issues/708

jjxtra commented 4 years ago

You will definitely want to lock per key using buckets - with more and more cores available on today's CPU and especially with heavy requests that take hundreds of milliseconds or more, a busy web server will spend quite a lot of time dealing with singleton locks where tons of queries will stack on the lock, causing lots of kernel transitions.

C# lock is pretty optimized - it will spin lock first for a few milliseconds to avoid the kernel transition, but the whole point of caching is because the queries are expensive, and in this case the lock is almost always going to do a kernel transition which still ends up being cheaper than a spin lock for a lock held longer than a few dozen milliseconds.

This solution in my tests has no noticeable performance impact for single core execution, and the performance gains will increase as more cores and requests per second are added.

public class CachingService
{
    private readonly int[] keyLocks = new int[8192];

    public virtual async Task<T> GetOrAdd<T>(string key, Func<Task<T>> addItemFactory)
    {
        uint hash = (uint)key.GetHashCode() % (uint)keyLocks.Length;
        while (Interlocked.CompareExchange(ref keyLocks[hash], 1, 0) == 1) { Thread.Yield(); }
        try
        {
            return await addItemFactory();
        }
        finally
        {
            keyLocks[hash] = 0;
        }
    }
}

alastairtree commented 4 years ago

The current code does lock on the insertion/retrieval of the from the cache - however because we are using Lazy that is not the same as locking on the entire time the cache item factory is running and building your object.

So we only lock on the time to get/add an in-memory object to an in memory data structure. The Lazy then locks on the actual factory and this is per item not global. Although not ideal, in my experience this has always been good enough - if I am bothering to cache something it usually takes 10's or 100's of milliseconds to actually generate the cache item, and so the small price to get the Lazy out of the dictionary is relatively pretty small.

Locking by key, or by key range in the example looks interesting but I think we should prove it is better (faster and without substantial memory increase) using benchmark-dot-net before taking on the additional complexity.

jjxtra commented 4 years ago

Updated my example with Interlocked.CompareExchange. No noticeable difference in performance tests versus not locking at all.

SQLSourcerer commented 4 years ago

In the two places where it's locking on the SemaphoreSlim, is it doing work that would ever be executed any way except synchronously? If not, could it just use a Monitor on the key (or possibly the key + a private guid string)?

agrinei commented 4 years ago

I was facing a serious contention problem under moderate load. I've removed the original locking code and implemented the lock by key as suggested by @jjxtra (slightly modified). Since then I've noticed a nice performance improvement and no more contention.

int hash = GetLockIndex(key);
while (Interlocked.CompareExchange(ref keyLocks[hash], 1, 0) == 1) { Thread.Yield(); }

private int GetLockIndex(string key)
{
    // Since GetHashCode can return a negative number we slide it to the positive range
    return Convert.ToInt32((Convert.ToInt64(key.GetHashCode()) + Int32.MaxValue) % keyLocks.Length);
}

jjxtra commented 4 years ago

The more cores and more requests per second the bigger the win you will get by locking per key bucket, especially if you have heavy queries that take more than a few hundred milliseconds.

agrinei commented 4 years ago

Is there any reason for not replacing the current lock implementation? Any downside?

jjxtra commented 4 years ago

I’ve already proved the lock per key is much faster under heavy load and higher core count when I assisted with the Polly duplicate request collapser policy. This is a no brainer and only costs like 8k of ram.

alastairtree commented 4 years ago

Perhaps someone would be willing to submit a PR for this?

jjxtra commented 4 years ago

PR up here: https://github.com/alastairtree/LazyCache/pull/126

For more robust usage, Polly provides additional functionality like cache, retry, circuit breaker, duplicate request collapser, etc.

https://github.com/App-vNext/Polly https://github.com/Polly-Contrib/Polly.Contrib.DuplicateRequestCollapser

alastairtree commented 4 years ago

Yeah Polly is great, especially when you need more complicated behaviours.

alastairtree commented 4 years ago

Thanks! Will be released in LazyCache 2.1.1

alastairtree / LazyCache

SemaphoreSlim block all calls #104