bitfaster / BitFaster.Caching

High performance, thread-safe in-memory caching primitives for .NET
MIT License
466 stars 28 forks source link

Cache bound by weighted count #602

Open dave-yotta opened 4 months ago

dave-yotta commented 4 months ago

i.e. if we know or approx the memory usage of each entry in Mb and want to bound the cache to X Mb. I see there's a fixed capacity internally on the concurrent dictionary - so probably not straightforward.

If we picked a suitable N for the capacity bound and did this psuedocode:

onUpdated += x => { total = Interlocked.Increment(x.Size); if(total > limit) lfu.Trim(currentSize*0.7); }
onRemoved += x => total = Interlocked.Decrement(x.Size)

any thoughts on the performance/stability?

bitfaster commented 4 months ago

I plan to replicate Caffeine's size-based algorithm for ConcurrentLfu, and have this as a built-in option. This would be similar to time-based eviction, but instead of an expiry calculator you would pass in an item gauge to determine size.

Implementing this from the outside presents a few problems:

  1. There is no OnAdded event to hook into
  2. Events are not yet implemented for ConcurrentLfu, so you could only try this with ConcurrentLru. I will add LFU events in the next couple of months.
  3. Trimming 30% by item count doesn't account for weights, so you may trim too many or few items. To make this more accurate, you might instead loop while (total > limit) lfu.Policy.Eviction.Trim(1);, such that you keep deleting the next candidate for removal according to the cache's eviction policy. This would remove the minimum number of items for the cache to fit within the limit.
  4. The interlocked operations on the total size counter don't guard against all races, so for example interleaving two concurrent updates would result in two threads incrementing size yieldingtotal > limit == true. Two threads will now call Trim, removing 30% of the items successively (if you had 100 items cached, this would be 100 0.7 0.7 = 49 items in the cache, and so on). If there are many concurrent updates this would converge towards an empty cache. The easy way to solve this is with a global lock for all mutate operations, but that will kill throughput and the lock will be contended as the number concurrent threads increases.

In summary I would expect it to be quite fast but unstable in the sense that the cache would be trimmed more than needed, reducing hit rate. In some scenarios adding the global lock to restore stability might not matter, but I wouldn't choose that option without measuring/testing in the context of the real application.

dave-yotta commented 4 months ago

Thanks for the advice. We're having a tough time finding an implementation in .NET!

dave-yotta commented 4 months ago

Since there's locks inside trim, we're thinking something like this will be more appropriate as a workaround: https://gist.github.com/dave-yotta/a3163bb7c81aa5b0d4e2ad4b482ac2aa

bitfaster commented 4 months ago

I left a comment on your gist - in practice I think it will work but will be subject to incorrect total size due to races between GetOrAdd, Set, TryRemove and time-based expiry. If you don't use any of those things, there are no races. The races will be benign - under some conditions Trim may be off by one or two removing more items than needed which will slightly reduce the efficacy of the cache.

The semaphore inside Set might reduce concurrent throughput or use a lot of CPU due to spinning if called at very high frequency - again not likely to be your primary use case but is a potential failure mode that I wouldn't ship as default behavior in a library.

dave-yotta commented 4 months ago

@bitfaster how would you feel about a PR to make bool TryAdd(K key, V value) public? :D

bitfaster commented 4 months ago

I left another comment in your gist with a workaround using GetOrAdd - assuming inconsistent size between add/update is the issue you are hitting.

Class methods can be made public, it is more complicated to change interfaces. I didn't do a good job of aligning all the ICache APIs with ConcurrentDictionary. My intent was to fix all of this as part of v3, because it will be a breaking change to make everything consistent across the whole API surface.