[FEATURE] 🦅 Eager Refresh

Scenario

Say we want to "cache something for 10min".

Easy peasy, we can do something like this:

var id = 42;

cache.GetOrSet<Product>(
    $"product:{id}",
    _ => GetProductFromDb(id),
    options => options.SetDuration(TimeSpan.FromMinutes(10))
);

Sometimes though we may want to do something like "cache something for 10min, but start refreshing it some time before expiration so that at the 10min mark there would not be a slowdown because of the refresh operation".

With FusionCache this has always been possible, thanks to fail-safe + soft/hard timeouts: we just have to change the way to pose the requirement to something like "cache something for 10min, but in case the refresh will take more than (say) 10ms just temporarily reuse the stale value so there would not be a slowdown because of the refresh operation".

The code needed would be:

var id = 42;

cache.GetOrSet<Product>(
  $"product:{id}",
  _ => GetProductFromDb(id),
  options => options
    .SetDuration(TimeSpan.FromMinutes(10))
    .SetFailSafe(true)
    .SetFactoryTimeouts(TimeSpan.FromMilliseconds(10))
);

The end result is basically the same (no delays when refreshing), but the thing needed is a mental shift in how to think about what to do.

Problem

Now, here's the deal: it would be nice to be able to just specify "eagerly refresh some time before the expiration" or something like that, instead of having to change the mental model.

This approach is also not completely new: in the caching field there are things like the StaleAfter option in the CacheTower library, or the "Cache Prefreshing" option in the Akamai CDN.

Solution

It seems reasonable to provide a way to obtain the same result but with a direct and more clear approach, even just to lower the mental gymnastics needed and to lower the entry barrier.

Finally, this approach may be used in conjunction with the aforementioned existing features (fail-safe and timeouts), so that we may be able to either use eager refresh without fail-safe (if so desired) or to use all of them together.

Design proposal

A new addition in the FusionCacheEntryOptions class, to be able to specify how eagerly to start the refresh, even if the cache entry is not yet expired.

There are 2 possible ways to specify "how eagerly".

TimeSpan

As a TimeSpan this would be a direct value, like TimeSpan.FromSeconds(10).

🟢 PRO: easy to reason about the exact amount of time
🔴 CON: it needs to be specified directly for each entry options where you would like to use it
🔴 CON: cannot be set in the DefaultEntryOptions, to automatically adapt to each call's specific Duration
🔴 CON: since it's not a relative but an absolute value, we need to remember to always updated it in case we'll update a Duration in the future (error prone)

Percentage

As a percentage, in the usual floating point notation: an example may be 0.9, meaning 90% (of the Duration).

🟢 PRO: since it's a relative value, it can automatically adapt itself to each specific Duration used. For example by saying 0.9 you will know that it will be "90% of the Duration", without having to do mental calculations (most probably the mental approach is not tied to a specific TimeSpan value, but more something like "I would like it to happen at 90% of the Duration")
🟢 PRO: if in the future you will need to change a Duration in a specific call, you would NOT need to remember to also change the eager duration (as a TimeSpan) to keep the 2 aligned (less error prone)
🟢 PRO: it can be set once in the DefaultEntryOptions and automatically applied to every call, dynamically adapting to each call's Duration
🔴 CON: (kinda) it may be less quick to know at a glance the exact eager duration. In reality though would this actually be needed? Meaning, just knowing "at 90% of the Duration the data will be refreshed" would most probably be more than enough. Also, if we think about debugging/logging, it's really easy to log the eager duration as both a percentage AND as the resulting (calculated) TimeSpan, for ease of use

Because of the reasons above, it seems clear that the percentage approach would be better, so this will be explored in an impl and see how it goes.

Also, although this does not imply anything in particular, it gives some confidence knowing that the Akamai CDN actually uses the percentage approach: this is, at least, a point in favor of such approach, since it has been widely used in a battle tested production environment with success.

One additional idea may be to have support for both: this solution though would mean worse performance (more memory consumed to store both of the values). Also, it would probably create some confusion about what approach to use, and what may happen when setting both values (which one should win? should setting one value reset the other? etc). Finally, for the reasons explained above, it may possibly be more error prone: for example by specifying a Duration of 10min and an eager refresh of 9min, only to later change the Duration to 20min and forgetting to update the eager refresh to 18min (or whatever would be the related new value).

Alternatives

As described at the beginning, the current approach of fail-safe + timeouts may get you the same approach, but it seems to require more mental gymnastics.

Finally, there may be a use-case for using the 3 features together: eager refresh + fail-safe + timeouts, which may be nice.

Technical Details

Of course in a highly concurrent scenario, only one request would start an eager refresh: this is the same Cache Stampede prevention that happens when normally running a factory to refresh the data after expiration, so the same mechanism should also be used here for the same reasons.

Additionally, during an eager refresh the underlying cache entry is not yet expired, so only one call should obtain the mutex and start the background refresh, while all the others should simply skip it: this can be done by trying to acquire the mutex with a timeout of zero. This would allow only the first request arrived after the passing of the eager refresh to get the mutex and start the background refresh, while all the other requests would simply see that the mutex is already "taken" and move on by using the current value.

Some benchmarks should be made to ensure that the performance does not degrade (or anyway, at least in a reasonable way) between a series of calls with and without eager refresh enabled, in each phase (before the "eager threshold" is hit, and after that).

Finally it should be safe to hit the actual expiration even when an eager refresh is still running, and maybe decide what should happen in such an edge case.

ZiggyCreatures / FusionCache