ZiggyCreatures / FusionCache

FusionCache is an easy to use, fast and robust hybrid cache with advanced resiliency features.
MIT License
1.73k stars 91 forks source link

Sliding Expiration #48

Closed darcome closed 2 years ago

darcome commented 2 years ago

Hello,

I'd like to know if there is the possibility to set a sliding expiration so that if the cached object is updated or read, the expiration counter is restarted.

Is this possible? If not, are there other ways to acheive it?

Thanks in advance and keep up the good work!

jodydonetti commented 2 years ago

Hello @darcome and thanks for trying out FusionCache!

FusionCache does not support traditional sliding expiration itself, but that is by design because you can get the same results (or even more!) with a different approach.

In my experience a "simple" sliding expiration may lead to situations where you end up having a cache entry that is used a lot and, because of that, it keeps "sliding" and never expires, and so it never update itself. This of course unless there's a separate update call somewhere, but sometimes it's easy to forget to do that in some situations or to miss one call.

Usually the reason sliding expiration is used is because we want to keep something in the cache until it is used, without having to wait for a (maybe) slow call to populate it, but as I said this may lead to missed updates along the line.

Instead you can obtain an even better result in a different way: just enable both fail-safe and set some timeouts.

Let's see a practical example.

Say we want to:

To do that you just need to do this (eg: a product from the database):

var product = cache.GetOrSet<Product>(
    "product:123",
    _ => GetProductFromDb(123),
    options => options
        // CACHE DURATION
        .SetDuration(TimeSpan.FromMinutes(5))
        // ENABLE FAIL-SAFE, MAX 10 MIN
        .SetFailSafe(true, TimeSpan.FromMinutes(10))
        // SET A 50 MS SOFT TIMEOUT
        .SetFactoryTimeouts(TimeSpan.FromMilliseconds(50))
);

In this way what you get is still a "sliding" duration of 5 min, but including the automatic update of the data itself if needed, all without having to wait for a slow db call (that may happen if the db is temporarily overloaded or if there's a temporary network congestion or whatever) and all while not keeping it in the cache for more than 10 min if nobody uses it (saving memory).

It basically boils down to a sliding duration of 5 min + another 5 min in which it has time to auto-update the data automatically, and everytime there's a fresh piece of data, the 5-10 min duration starts back, like the "simple" sliding expiration itself.

One last thing: to simplify the code even more and save some memory you may want to create an options variable and re-use it, like this:

// SOMEWHERE AT STARTUP
var slidingOptions = new FusionCacheEntryOptions(TimeSpan.FromMinutes(5))
        .SetFailSafe(true, TimeSpan.FromMinutes(10))
        .SetFactoryTimeouts(TimeSpan.FromMilliseconds(50));

// [...]

// LATER ON
var product = cache.GetOrSet<Product>(
    "product:123",
    _ => GetProductFromDb(123),
    slidingOptions
);

I usually put a set of common caching options in a public static class, like a singleton, and always refer to them for better code centralization and easu of use, but that's a personal choice. Something like this:

public static class MyCachingOptions {

  public static FusionCacheEntryOptions Products = new FusionCacheEntryOptions(TimeSpan.FromMinutes(5))
        .SetFailSafe(true, TimeSpan.FromMinutes(10))
        .SetFactoryTimeouts(TimeSpan.FromMilliseconds(50));

  public static FusionCacheEntryOptions Users = new FusionCacheEntryOptions(TimeSpan.FromMinutes(1));

  public static FusionCacheEntryOptions Categories = new FusionCacheEntryOptions(TimeSpan.FromMinutes(10))
        .SetFailSafe(true, TimeSpan.FromHours(1))
        .SetFactoryTimeouts(TimeSpan.FromSeconds(10));

// ETC...

}

Let me know if you need something else, I hope I've been able to help!

NOTE: the sample is using the sync api to keep it as simple as possible, but everything is also available in an async fashion.

jasenf commented 2 years ago

Hi guys --

my 2cents: I would prefer to see and use a traditional sliding-expirations over the suggested solution. The great thing about sliding expirations is they work great for data that is very "heavy" to load from the DB but is accessed during unpredictable timing schedules. The best example of this is a dynamically generated web page. The construction of that page may take a lot of DB heavy lifting, but once you put it together, it may not change at all for a few days. Traffic to a web page can be sporadic, with no traffic at all coming for hours (i.e. overnight?) and then burst traffic for hours during peak times of the day.

We use a sliding expiration for this exact scenario. We set the timeout very low (5 minutes?) but rely on the sliding cache to basically say "as long as people are viewing this page, keep it in memory".

We also use the sliding cache in almost all of our cache-aside strategies. Our system keeps a global "lastUpdate" DateTime record for all system data about each customer. Every time a record changes in a customer's account, we simply update this one master field. We then rely on this record to build all of our keys. So for example:

var myData = cache.GetOrSet(
                     "key_"+lastUpdate.Ticks, 
                     { return myDataFromTheDb },
                     SlidingExpiration.TimeSpan 5 minutes);

Now, as long as no customer data has been updated, this cache will hold for whatever extended period of time it is getting used. If lastUpdate changes a new cache entry is generated and this one elegantly just expires. However, if people keep using this data, this cache entry remains in play.

I'm a big fan of sliding-expirations and hope FusionCache implements it someday.

darcome commented 2 years ago

Thank you for your answer. Then I suppose the only option with FusionCache would be to add a variable to the base class of all business objects (if the project has been created like that) to hold when the object has been requested so that it is possible to achieve the sliding manually.

jodydonetti commented 2 years ago

@darcome may I ask why the solution I outlined wouldn't work for your case? I'm asking because I'd like to understand your sliding expiration usage a little more, to see if I'm missing something, if there are shortcomings in FusionCache current approach, etc.

darcome commented 2 years ago

First of all I want to be sure i understood correctly how FusionCache works... basically you can extend the life of the resource in the cache, but even if the resource is heavily requested, sooner or later it will be evicted from the cache and therefore it will be necessary to get it from the db or from any other location and it will require a "lot of time".

If the above is correct, with a sliding expiration, if I know that a resource is heavily used, it will never expire, and therefore, potentially, I will have to spend the time to retrieve it, only one time for the entire life of the service.

As an example, think about the usual ecample of blog posts... Imagine the tags, or categories... they are ALWAYS used, one way or another, therefore, potentially I could even gather them all from the database aduring start up and forget about the database, updating them in the cache whenever they are updated in the db.

I may have simplified things a bit, but I hope you get the point.

Let me know what you think about it and if I have misunderstood how FusionCache works.

jodydonetti commented 2 years ago

Hi @jasenf , if you'd like i would say let's explore this space a little bit, just like we did last time with the backplane 😉 and maybe with the help of @darcome , too.

my 2cents: I would prefer to see and use a traditional sliding-expirations over the suggested solution. The great thing about sliding expirations is they work great for data that is very "heavy" to load from the DB but is accessed during unpredictable timing schedules.

We use a sliding expiration for this exact scenario. We set the timeout very low (5 minutes?) but rely on the sliding cache to basically say "as long as people are viewing this page, keep it in memory".

Wouldn't that be the same with the approach I outlined above?

The difference I see is that if the thing expires ( 5 min passed without a request), the next request coming in would kick the load/generation again, with the user who made the request having to wait for it to finish. With fail-safe + timeouts the load/generation would kick in the same, but the user would not have a delay waiting for that to finish, but just use the old value as if 5 min weren' passed, with automatic update in the background as soon as the process finish.

Am I missing something?

We also use the sliding cache in almost all of our cache-aside strategies. Our system keeps a global "lastUpdate" DateTime record for all system data about each customer. Every time a record changes in a customer's account, we simply update this one master field. We then rely on this record to build all of our keys. So for example:

var myData = cache.GetOrSet(
                     "key_"+lastUpdate.Ticks, 
                     { return myDataFromTheDb },
                     SlidingExpiration.TimeSpan 5 minutes);

Now, as long as no customer data has been updated, this cache will hold for whatever extended period of time it is getting used. If lastUpdate changes a new cache entry is generated and this one elegantly just expires. However, if people keep using this data, this cache entry remains in play.

Yep, makes sense: I use the same approach (with FusionCache of course) in a couple of situations, with a "last update" value used in some cache keys as a "cache buster ". I think it's called something like "key based cache expiration" or something like that, and it works very nicely. But again, it works with FusionCache too.

I'm a big fan of sliding-expirations and hope FusionCache implements it someday.

Ok so let's explore a little bit.

The problem I see with "simple" sliding expiration is that it's usually used when loading (or generating) the value is quite heavy and takes some time: in that case, what happens every "first" request after something is expired? That first request would be slow too, because there's nothing to use as a fallback while the value is being loaded.

With fail-safe + timeouts though, that doesn't happen, so I see that as a win.

But... 🥁 drumroll...

But now that we're talking more about it I see a potential issue, or anyway a way to make everything even better.

Let me explain my mental process.

Since the beginning of the design phase of FusionCache I always thought about not wanting to implement sliding expiration because I saw things like this:

Sliding Expiration

FusionCache with fail-safe + timeouts

Now that I think about it more, I realize I might have failed to see one additional PRO/CON, and that is the number of requests towards the database (or whatever).

If a cache entry is requested very frequently + the data load is heavy + you use a timestamp or similar in the cache key, what happens is that with normal sliding expiration you would have only 1 data loading at the very beginning, whereas with FusionCache (with fail-safe + timeouts) you would have 1 data loading every 5 min.

So, to update the previous PRO/CON list, the thing should become this:

Sliding Expiration

FusionCache with fail-safe + timeouts

Would you agree with this?

And if that is the case, it would in fact nice to have sliding expiration on top of the other features, so that the end result would be something like this:

FusionCache with sliding expiration + fail-safe + timeouts

getting us the best of both worlds.

So, what do you think @jasenf and @darcome ? Would you agree?

jodydonetti commented 2 years ago

First of all I want to be sure i understood correctly how FusionCache works... basically you can extend the life of the resource in the cache, but even if the resource is heavily requested, sooner or later it will be evicted from the cache and therefore it will be necessary to get it from the db or from any other location and it will require a "lot of time".

Yes, you are correct, but (and this is very important) when it expires and the time comes to get the data again from the db, if you set a timeout of, say, 50 ms -> the user would NOT wait a "lot of time" for the data to come from the db, but it will be served the old version of the data (just like with sliding expiration) after just 50 ms AND as soon as the data comes back from the db, the cache will be updated in the background.

So from the point of view of how fresh the data is and how much time to wait, it's better than sliding expiration (because the data still gets updated + response times are low).

The only downside I can think of, as highlighted above in my last answer to @jasenf , is that in fact you would have some more requests to the database every now and then (but only every 5 min or whatever you choose as your cache duration).

Anyway if you would like read my thought process above about the hypothesis of having sliding exp + fail-safe + timeouts with your vision and personal experience, that would be great!

jasenf commented 2 years ago

Hi Jody --

Your pros and cons are correct. WIth FusionCache there is a FORCED reload, even if the data is used frequently and we already know what's in the cache is fresh.

I also personally don't like this concept of relying on a "failsafe" to grab theoretically expired-data. I get that it's just a design concept but there's something not sitting well with me in regard to relying on a something called a fail-safe to retrieve what I would consider fresh data (if we were using it as a replacement for sliding-expiraton)

I don't think I would ever use the failsafe in our caching scenarios. We either get the data from what we considered a fresh cache, the database, or fail. Returning any data after those 2 returned nothing would guarantee we are basically using the wrong data.

I appreciate your design thoughts around not implementing a sliding expiration, but I think it's a bit too opinionated (not in a bad way). There are just a ton of useful use cases for it.

I swear I'm going to try this lib eventually :-) once we finish some of these big release cycles. haha

darcome commented 2 years ago

I'd like a simple sliding too, especially if there is a way to update the cache if the resource is updated. Because as far as I understood your 5 min + 5min fail safe is the same as a 10 minute expiration time.

Regarding the possibility of forgetting to update the resource in the cache, in my case, I am using services to operate on the database, therefore there is one and only one fucntion where a resource can be updated in the db, and therefore is almost impossible to forget to update the resource in the cache when it is updated in the db.

If you use interfaces and a DbServiceBase class (for example), you can enforce the presence of the code to udpate the cache, therefore it's impossible to forget to write the corresponding code.

jodydonetti commented 2 years ago

Your pros and cons are correct. WIth FusionCache there is a FORCED reload, even if the data is used frequently and we already know what's in the cache is fresh.

When using a key-based cache expiration then yes, that is the case, since the cache key changes based on lastupdate timestamp or similar so basically the data in the cache is immutable per-key.

I also personally don't like this concept of relying on a "failsafe" to grab theoretically expired-data. I get that it's just a design concept but there's something not sitting well with me in regard to relying on a something called a fail-safe to retrieve what I would consider fresh data (if we were using it as a replacement for sliding-expiraton)

Well, not really: fail-safe (+ timeouts) would be used only to be sure the time it takes to refresh the data is not high, it's not the substitute for sliding expiration itself. On top of it, since the assumption is that when data changes, the cache key also changes, you really cannot have stale data (with or without fail-safe). I mean, by definition if cache data is immutable per-key, it will never be stale, right?

If you don't feel like using a feature called "fail-safe" that's fine of course, but based on the cache usage you explained (timestamp will vary cache key) it will never be stale, it's not physically possible. Would you agree?

I don't think I would ever use the failsafe in our caching scenarios. We either get the data from what we considered a fresh cache, the database, or fail. Returning any data after those 2 returned nothing would guarantee we are basically using the wrong data.

As said above, if changes to data also change the cache key no, it's not possible.

Let me put it this way.

With a key-based cache exp and a 5 min sliding exp you would have data in cache as long as someone request it in a 5 min window, and after that the memory will be freed. Any new request for the same data after that point and the DOWNSIDE would be incurring in a load delay. On the other hand if data is constantly requested, the UPSIDE it will never ever be loaded again from the db.

With the approach I outlined the end result is the same (the data is never stale) except that if data is requested again after 5 min, you'll have an extra 5 min (or whatever you choose) in which the data can be refreshed from the db with the UPSIDE of being able to use already cached data very fast (because timeouts handling), with the DOWNSIDE of having 1 extra data load every 5-10 min.

In the end it's a balance.

I appreciate your design thoughts around not implementing a sliding expiration, but I think it's a bit too opinionated (not in a bad way). There are just a ton of useful use cases for it.

Thanks, I'm open to implement sliding exp but I'm trying to understand if that is really necessary, all the pros/cons of such a solution (even more so in a potential multi level cache like FusionCache, which is different) and so on.

I swear I'm going to try this lib eventually :-) once we finish some of these big release cycles. haha

Ahah thanks 😂 , that would be great. I really appreciate your effort in following along the evolution of FusionCache and your participation in the discussion, that already helped better shape the backplane in the past!

I have one more question for you, if I may: since you are currently using sliding exp, I would like to know if in your code you typically access it via a standard Get call and, if that is the case, what do you do when the data is not in the cache anymore because the 5 min window is passed? I would suppose you then call your "data load" function and then save the result in the cache again with another 5 min sliding exp, something like this:

var data = cache.Get<MyType>("key");
if (data is null) {
  data = MyDataLoad();
  cache.Set<MyType>("key", data, 5 min sliding exp);
}

Am I right?

jodydonetti commented 2 years ago

Regarding the possibility of forgetting to update the resource in the cache, in my case, I am using services to operate on the database, therefore there is one and only one fucntion where a resource can be updated in the db, and therefore is almost impossible to forget to update the resource in the cache when it is updated in the db.

Understood, and it makes sense. I was thinking more of "extraordinary" situations, like a one off db update to fix something, or a background job that maybe is not passing through the same centralized function that updates everything, or something like that.

In my experience these are things that in theory should not happen - and we can all agree on that - but in practice sooner or later they probably will, and the automatic refresh every 5 min (or whatever) is something so sporadic that is not really a problem for the database load, and in the end is a pragmatic solution that may help alleviate proclems in those situations. Of course your mileage may vary, and it's a personal choice to use such approach or not, there's not a "wrong" way.

[...] as far as I understood your 5 min + 5min fail safe is the same as a 10 minute expiration time.

Not really, because a GetOrSet call in the second 5 min phase will reload the data from the db and the result will be put again in the cache for another 5 min, basically getting the same result of a sliding exp in terms of data remaining in the cache but without having to wait a lot for a data load operation that may be slow.

But I have a doubt about how you use the cache you are currently using, and I would like to make you the same question I've made to @jasenf , because it's very important understand each others.

Since you are currently using sliding exp, I would like to know if in your code you typically access it via a standard Get call and, if that is the case, what do you do when the data is not in the cache anymore because the 5 min window is passed? I would suppose you then call your "data load" function and then save the result in the cache again with another 5 min sliding exp, that is something like this:

var data = cache.Get<MyType>("key");
if (data is null) {
  data = MyDataLoad();
  cache.Set<MyType>("key", data, 5 min sliding exp);
}

Am I right?

darcome commented 2 years ago

Since you are currently using sliding exp, I would like to know if in your code you typically access it via a standard Get call and, if that is the case, what do you do when the data is not in the cache anymore because the 5 min window is passed? I would suppose you then call your "data load" function and then save the result in the cache again with another 5 min sliding exp, that is something like this:

var data = cache.Get<MyType>("key");
if (data is null) {
  data = MyDataLoad();
  cache.Set<MyType>("key", data, 5 min sliding exp);
}

Am I right?

Yes, that's what I do. It would be great a GetOrSet function that takes a delegate and does everything atomically to prevent the stampede problem, but that's something you are already aware of :)

Another question... is it possible to instantiate more than one FusionCache instance? So that it is possible to have "strongly typed" caches and shorter keys?

Thanks in advance!

jodydonetti commented 2 years ago

Yes, that's what I do. It would be great a GetOrSet function that takes a delegate and does everything atomically to prevent the stampede problem, but that's something you are already aware of :)

Ahah yes I would say I am 😬

But that is why I've made that question! If you start using GetOrSet calls to, among other things, avoid a cache stampede you are now in the realm of not having to care anymore about using a sliding exp (imho) because you will always have a result, and fast (thanks to fail-safe + timeouts). The only real difference is that every 5-10 min you will have an extra data load from the db, but in my experience that is not frequently a problem (but of course your scenario may be peculiar in that, I don't know).

Now, if you change the cache key based on last update timestamp (so a key-based cache exp) you will effectively never have stale data, because data in the cache will be basically immutable (as explained above to jasenf) so it's not different in the result from a sliding exp.

If instead you don't do that and data in the cache may change over time per-key, then you either:

Unless I am missing something of course 😅

Another question... is it possible to instantiate more than one FusionCache instance? So that it is possible to have "strongly typed" caches and shorter keys?

Yes absolutely!

If you instantiate by hand (eg: via new FusionCache(...)) you can create how many you want, just remember 2 things:

If you use a DI approach instead it's not currently possible to configure multiple instances directly (eg: like you would do with http named clients) but I'm thinking about a similar approach for a future version. In the meantime what you can do is create your own abstraction, something like a MyCachesService class, declare multiple IFusionCache instances as props, and register/resolve that via DI.

Something like this:

public class MyCachesService {
  public MyCachesService(IServiceCollection services) {
    UserCache = new FusionCache(various options...);
    ProductCache = new FusionCache(various options...);
  }

  public IFusionCache UserCache {get; }
  public IFusionCache ProductCache {get; }
}

and in the Startup.cs register it:

services.AddSingleton<MyCachesService>();

And then in your controllers' constructor simply declare a param of type MyCachesService and you'll be able to simply use it in the actions, something like this:

public class ProductController : Controller
{
    private readonly MyCachesService MyCaches;

    public ProductController(MyCachesService myCaches)
    {
        // REQUEST IN THE CONSTRUCTOR + SAVE IN THE PRIVATE FIELD
        MyCaches = myCaches;
    }

    [HttpGet]
    [Route("/product/{id:int}")]
    public async Task<IActionResult> Product(int id)
    {
        // USE IT
        var product = MyCaches.ProductCache.GetOrSet<Product>(
            $"foo:{id}",
            _ => GetFooFromDb(id),
            opt => opt.SetDuration(TimeSpan.FromMinutes(5))
        );

        return Json(product);
    }
}

Hope this helps!

jodydonetti commented 2 years ago

I'm moving this to the recently opened Discussions board.

In case this will be approved to become a feature a separate issue for the design & development will be created.