MichaCo / CacheManager

CacheManager is an open source caching abstraction layer for .NET written in C#. It supports various cache providers and implements many advanced features.
http://cachemanager.michaco.net
Apache License 2.0
2.34k stars 457 forks source link

Support Refresh-Ahead as a Cache Expiration Option #233

Open BrutalSimplicity opened 6 years ago

BrutalSimplicity commented 6 years ago

It seems that this has been discussed several times in the comments of the main website under various questions/comments.

Hi, We are relying on cacheManager to cache REST response for 1 hour.

After 1hour, it is possible to answer back the latest cached response in case the REST service is not available? After 1hour, is it possible to automaticaly call the REST service in background?

I have read that this is the netflix strategy : if an external service does not answer, they send back the latest know cached information.

Does this support stale cache items? Like items that have passed their expiration but can still be loaded from cache, so that a background thread can be spun up to retrieve fresh results upon detection of the staleness?

Something like if(GetCacheItem("key").IsStale) { RefreshMyData("key"); }

You'd probably need to attach "staleness" handling to particular keys/regions or something if you don't already have it.

@MichaConrad the use case is, for example, a stock quoting site that would guarantee "20 minute" lag. You would set up the system to actually refresh every n<20 minutes, say 15 minutes, and when the cache item invalidates (expires), instead of "no data" returned, it still delivers the stale data thats still within the policy of the site, but fires off the background task to refresh that data, instead of now in a panic having to replace the data for the given stock.

One could implement this as two cache keys, one with the actual data and no expiration, say key="AAPL". A marker item that indicates expiration could be used like "[[AAPL". A read would look at AAPL and display that data, and would then check [[AAPL to see if the cache had expired that key, and initiate the task to reload. However, this would create 2 reads for every cache key, whereas being able to call an alternate endpoint that would deliver the data AND the cache meta data would create a single trip.

The core issue here is when using a web farm, to prevent the "thundering herd" problem of say 20 machines reading a single cache key, then all 20 machines receiving no-data because the cache key expired, then all 20 machines firing off requests to the data store to get the data.

The process here would be that AAPL still contains the stock data, [[AAPL is the coordinator of the servers by indicating that the data "will be refreshed" but with a short timeout to prevent a hangup.

Perhaps a more useful feature is a usable solution built into the library for the 'thundering herd' problem. (Of note, EHCache mentions a 'Thundering Herd' solution, but if the data store to create the cache key is expensive, ie >5 seconds for example, then using their implementation, your website cant render any webpages for 5 seconds while a data set is reloaded.)

The scenarios described above are just a few use cases where refreshing the cache ahead of expiration is extremely useful. So, is it possible to support this in CacheManager?

I don't know that I'd have time to dig in and implement a feature like this, but as a starting point, here are some ideas on the design.

A good implementation of this is done in Guava (https://github.com/google/guava/wiki/CachesExplained#refresh).

I'm sure I've left out tons of detail, but I just thought I'd start the discussion, as I've been checking on this feature for nearly a year now. I'm here to help talk out a possible solution, and implement if time allows, so please let me know what you think.

Thanks.