MichaCo / CacheManager

CacheManager is an open source caching abstraction layer for .NET written in C#. It supports various cache providers and implements many advanced features.
http://cachemanager.michaco.net
Apache License 2.0
2.34k stars 456 forks source link

RegionKey lacking TTL #198

Closed keithl8041 closed 6 years ago

keithl8041 commented 6 years ago

We've been running CacheManager (using Redis as data store and backplane) in a large-scale production environment happily for a few months, last week the Redis server ran out of memory, bringing down the application.

We make heavy use of regions and on further inspection it appears that the regionKey settings are never expiring within Redis, as they are all set with a TTL of -1. Over time, these regionKey entries max out the available memory on the server.

I've taken a look in the code and my rudimentary understanding is that the regionKey code never has an TTL set. I would expect that the expiry is set to match the lifetime of the object stored in cache.

Reference: lines 874, 906 and 947 of https://github.com/MichaCo/CacheManager/blob/9837d4071c2490b868dcb4b6dcae8b674f0c6392/src/CacheManager.StackExchange.Redis/RedisCacheHandle.cs

MichaCo commented 6 years ago

Interesting that you run into memory issues with that. How many keys do you store in that region

Problem is that I cannot just use the TTL of keys within the region, because the TTL for each element within the region could be totally different and expiring the region would invalidate the whole thing eventually (which is not what you want).

Are your keys very random? Or why are there so many? Also, do you never "clear" the region or remove keys via the API so that the region lookup get cleaned up?

Maybe the region feature is not really the best thing to use for your use case. If you never use ClearRegion, then there is actually no point in using that feature in the first place.

If you have a cache key which consists off of multiple strings, just concat that and use the key parameter only

keithl8041 commented 6 years ago

Thanks for the quick reply. I understand what you're saying, just not sure of a workaround. It's not the number of keys within a region that's the issue, it's the number of regions themselves.. I currently have >100k regions each with between 1 and 25 keys within them.

My solution has thousands of short-lived entities which need to be cached and frequently updated over a short period of time, before their cache expires and then they are never accessed again. I create a region for each entity based on the entity ID and call ClearRegion whenever there's an update to an entity, passing in the ID. This works really well and has helped us out massively with increased performance but leaves me with millions of orphaned regionKey entries which clog up the server over time.

In an ideal world the regionKey would be removed automatically once all its linked items had expired. I can imagine that implementing this would create a fair amount of overhead though.

For my use-case, being able to set a default TTL on the regionKey entry (say, +1 day) would be perfectly acceptable. If an item was cleared when the TTL was removed then it can be rebuilt next time it's requested. Or perhaps there's a way to constantly reset the TTL of the regionKey entry every time a new linked entry is added (so the regionKey TTL always matches the highest TTL of linked items).

MichaCo commented 6 years ago

ClearRegion actually removes the region key

Removing on element from the region does not check if the region lookup still has other keys or if that was the last one. That's for performance and concurrency reasons as it could cause issues. But also because all that logic is client side and I do not get any information whenever a key within the region just expires...

You could call database.KeyDelete(region); manually, using the StackExchange client, whenever you have to cleanup regions which are not used any longer. Not sure if that's possible to determine for you though.

Another way could be to listen on key events (CacheManager supports that, too, if configured) and whenever a key within a region gets removed from Redis, remove the field from the region lookup. And, if no other field is stored in the lookup, remove the region key. Again, that's also complex logic with some overhead in terms of performance. I could add it and make it a configuration option to turn it on/off. But that's quite a lot of effort ~~ You could implement that yourself though.

Regarding the TTL for the region lookup. This is unfortunately a lot more complex as some keys within the region might have sliding or absolute expiration. Determining the TTL for the region lookup would also mean that I not only have to refresh sliding windows for individual keys, but also for the region.

Not really sure if it is worth the effort to implement that with the risk to do something wrong in some scenarios ~~

gautelo commented 6 years ago

How come you need a region for each entity? Could you append your current region to the cache key, and use only one region?

In my opinion regions should not be randomized but have specific domain purpose. So I feel this is likely a design issue more than a technical issue. Interresting scenario, though.

keithl8041 commented 6 years ago

We've re-examined our design choices and they are still valid for our domain. If I had more time to spend on this issue I would attempt to implement your suggested solution;

Another way could be to listen on key events (CacheManager supports that, too, if configured) and whenever a key within a region gets removed from Redis, remove the field from the region lookup. And, if no other field is stored in the lookup, remove the region key.

However on closer inspection it turns out that our Redis server maxmemory setting was set to 'volatile-lru' which I believe would mean that none of the regionKey are ever expired as, although they have an expiry set, the expiry is set to -1. I have changed this to 'allkeys-lru' to ensure that the eviction pre-requisites are met and the regionKeys are eligible for eviction. If this sounds correct, then perhaps it's worth adding a line or two in the documentation to highlight the fact that regionKey entries are not deleted unless ClearRegion is called explicitly.

Thanks for your help and insight - CacheManager is a great tool.

MichaCo commented 6 years ago

Hey @keithl8041, did you configuration change in Redis fix that issue with the region keys?

keithl8041 commented 6 years ago

@MichaCo the issue hasn't had time to re-surface yet. I can see the Redis memory usage slowly creeping up to the 2.5gb max (full of long-expired regionKeys!) but haven't yet got to the point where the memory expiry config would start evicting the old keys. Will keep you posted.

keithl8041 commented 6 years ago

@MichaCo I can confirm that setting the maxmemory eviction strategy to allkeys-lru fixes the issue, as old regionKeys are now being evicted by the Redis server.