ZiggyCreatures / FusionCache

FusionCache is an easy to use, fast and robust hybrid cache with advanced resiliency features.
MIT License
1.65k stars 87 forks source link

[FEATURE] ↩️ Better Auto-Recovery #163

Closed jodydonetti closed 1 year ago

jodydonetti commented 1 year ago

Problem

The auto-recovery feature of the Backplane is already working pretty well, but sometimes there are situations where we can do better.

These are some examples.

✔ Background queue processing

FusionCache should respect the AllowBackgroundBackplaneOperations option for this too, instead of always being executed in the background.

✔ Remove duplicate items on publish

When a new non-auto-recovery message is about to be sent to the backplane and another one for the same cache key is already on the queue, the one in the queue should be automatically removed since the new one will be fresher.

✔ Better backpressure handling

When the distributed cache and the backplane are using the same underlying server/service, it can be useful to expire the distributed cache entry on the distributed cache when an auto-recovery message is about to be sent, since when the publish of the message originally failed it probably also failed the saving in the distributed cache, and without this there may be sync issues.

This though should be controllable, and a new EnableDistributedExpireOnBackplaneAutoRecovery option should be created: the default value should be true to get better results for everybody.

✔ Delayed processing on reconnect

When the underlying connection to a server/service (eg: Redis) is back again after a disconnect, now with #162 we can react immediately and start sending pending messages from the auto-recovery queue. the problem in doing so immediately is that the other nodes may not be yet ready, because in distributed systems the latency is not zero (see Fallacies of distributed computing for example).

Because of this we should introduce a slight delay to the mix, via a new BackplaneAutoRecoveryReconnectDelay option to control how much time to wait before starting the processing of the queue.

Even though there's no magic value that can guarantee a bulletproof solution all the times, a reasonable default value can be 2s.

Solution

Fix the issues above and more, and optimize the shit out of it.

jodydonetti commented 1 year ago

Hi all, v0.23.0 has been released with all of the above included 🎉