Open dave-yotta opened 4 months ago
I plan to replicate Caffeine's size-based algorithm for ConcurrentLfu
, and have this as a built-in option. This would be similar to time-based eviction, but instead of an expiry calculator you would pass in an item gauge to determine size.
Implementing this from the outside presents a few problems:
OnAdded
event to hook intowhile (total > limit) lfu.Policy.Eviction.Trim(1);
, such that you keep deleting the next candidate for removal according to the cache's eviction policy. This would remove the minimum number of items for the cache to fit within the limit
. total > limit == true
. Two threads will now call Trim
, removing 30% of the items successively (if you had 100 items cached, this would be 100 0.7 0.7 = 49 items in the cache, and so on). If there are many concurrent updates this would converge towards an empty cache. The easy way to solve this is with a global lock for all mutate operations, but that will kill throughput and the lock will be contended as the number concurrent threads increases.In summary I would expect it to be quite fast but unstable in the sense that the cache would be trimmed more than needed, reducing hit rate. In some scenarios adding the global lock to restore stability might not matter, but I wouldn't choose that option without measuring/testing in the context of the real application.
Thanks for the advice. We're having a tough time finding an implementation in .NET!
Since there's locks inside trim, we're thinking something like this will be more appropriate as a workaround: https://gist.github.com/dave-yotta/a3163bb7c81aa5b0d4e2ad4b482ac2aa
I left a comment on your gist - in practice I think it will work but will be subject to incorrect total size due to races between GetOrAdd
, Set
, TryRemove
and time-based expiry. If you don't use any of those things, there are no races. The races will be benign - under some conditions Trim
may be off by one or two removing more items than needed which will slightly reduce the efficacy of the cache.
The semaphore inside Set
might reduce concurrent throughput or use a lot of CPU due to spinning if called at very high frequency - again not likely to be your primary use case but is a potential failure mode that I wouldn't ship as default behavior in a library.
@bitfaster how would you feel about a PR to make bool TryAdd(K key, V value)
public? :D
I left another comment in your gist with a workaround using GetOrAdd
- assuming inconsistent size between add/update is the issue you are hitting.
Class methods can be made public, it is more complicated to change interfaces. I didn't do a good job of aligning all the ICache
APIs with ConcurrentDictionary
. My intent was to fix all of this as part of v3, because it will be a breaking change to make everything consistent across the whole API surface.
i.e. if we know or approx the memory usage of each entry in
Mb
and want to bound the cache toX Mb
. I see there's a fixed capacity internally on the concurrent dictionary - so probably not straightforward.If we picked a suitable N for the capacity bound and did this psuedocode:
any thoughts on the performance/stability?