Open jodydonetti opened 1 month ago
Great work!
One thing I was curious about while reading is your decision to implement your own strategy first, with the plan of leveraging .NET 9's new tagging support later. Doesnāt this carry the risk that you might later discover FusionCache's tagging approach isnāt fully compatible with .NET 9? There could be subtle issues that aren't obvious at the start. Personally, I would have done it the other way aroundāstarting with a version that only supports .NET 9, and then adding a more general approach afterward. This might also prevent users from switching from FusionCache to HybridCache in .NET 9 if theyāre aiming for the best performance.
Another key point for me, in my own projects, is ensuring seamless support for named caches. In many cases, you may not even need multiple tags if your cache entries are stored in separate named caches. So my main interest is in optimizing the performance of the Clear() methodāspecifically making sure there's no interference between different named caches. You already mentioned that this will be supported, so Iām glad itās covered. I'm also happy to hear youāre considering performance improvements for the Clear() method by special-casing the * tag.
What wasnāt entirely clear is whether this tagging feature will be opt-in or enabled by default. Since using the Clear() method already requires the * tag, Iām guessing thereās no way to enable this feature without some performance impact, even if tags or the Clear() method arenāt used at all?
Another scenario worth considering is making it easy to have a setup where most nodes use the cache normally, but one specific node handles invalidation. For example, in a web server connected to a CMS, you could have a hook in the CMS that triggers an Azure function or similar process to invalidate cached entries when content changes. This node would start with an empty cache but would immediately remove entries by key or call Clear(). While I assume this would work, it may be worth optimizing for this type of scenario.
Lastly, based on my experience implementing cache invalidation in a cluster environment, it's fairly easy to get it working 99.9% of the time. However, reaching 100% reliability can be difficult, and the last 0.1% often leads to nasty bugs like persistent stale caches and the need for manual cache flushes. So, Iād recommend dedicating time to addressing edge cases like node restarts, unexpected shutdowns, and parallel operations on tags from multiple nodes.
Thanks!
Hi @aKzenT
One thing I was curious about while reading is your decision to implement your own strategy first, with the plan of leveraging .NET 9's new tagging support later.
No, not really: as explained here I plan to support the Microsoft HybridCache abstraction, not the implementation. This means that people will be able to use FusionCache "as an implementation of" HybridCache, but FusionCache will keep its own design and implementation separate.
Doesnāt this carry the risk that you might later discover FusionCache's tagging approach isnāt fully compatible with .NET 9?
I don't think so, since what must be respected are the abstraction and behaviour, meaning the public api surface area + the end result which, in both cases, should be that when users "evict by tag FOO" -> every entry tagged FOO will be, for all intent and purposes, evicted (or look like evicted).
Also, I don't know the actual timing for the release of Microsoft HybridCache: it should've been with .NET 9, but it may have been delayed (here the question was about multi-node notifications, what in FusionCache is the backplane, and the answer is "no, we haven't done it yet and no, it won't be there on day zero"), so I really don't know.
There could be subtle issues that aren't obvious at the start.
This is true, but my main point is to give FusionCache users the feature, and later see how to make it work with the new Microsoft abstraction.
Personally, I would have done it the other way aroundāstarting with a version that only supports .NET 9, and then adding a more general approach afterward. This might also prevent users from switching from FusionCache to HybridCache in .NET 9 if theyāre aiming for the best performance.
I wouldn't assume the best performance is over there (may be, mind you, but may be not).
But again, just like with the IDistributedCache
abstraction, api surface area + behaviour should be maintained, that's all (I think).
Opinions?
Another key point for me, in my own projects, is ensuring seamless support for named caches.
FYI: HybridCache from Microsoft wil not support multiple named caches, nor DI keyed services.
In many cases, you may not even need multiple tags if your cache entries are stored in separate named caches.
Agree, for most cases this is true.
So my main interest is in optimizing the performance of the Clear() methodāspecifically making sure there's no interference between different named caches. You already mentioned that this will be supported, so Iām glad itās covered. I'm also happy to hear youāre considering performance improvements for the Clear() method by special-casing the * tag.
Good š¬
What wasnāt entirely clear is whether this tagging feature will be opt-in or enabled by default. Since using the Clear() method already requires the * tag, Iām guessing thereās no way to enable this feature without some performance impact, even if tags or the Clear() method arenāt used at all?
On the contrary, the idea is always (as much as possible) pay-per-use, so if you will not do anything tagging related, no extra cost will be involved.
Now, to be even more precise, yes technically there will be a fixed "extra cost"... in the form of a null
check to see if a cache entry has tags: no tags, no extra cost.
But I think we can agree that a single null
check is basically free right?
Btw when I'll be done with the feature I'll profile it even more and will warn for any extra cost associated with it, even when not used at all, se anyone will be informed and can make an informed decision.
Another scenario worth considering is making it easy to have a setup where most nodes use the cache normally, but one specific node handles invalidation. For example, in a web server connected to a CMS, you could have a hook in the CMS that triggers an Azure function or similar process to invalidate cached entries when content changes. This node would start with an empty cache but would immediately remove entries by key or call Clear(). While I assume this would work, it may be worth optimizing for this type of scenario.
In general it already works like this (meaning one "cms node" can be the one triggering evictions, while the other frontend nodes just receive the evictions), but I'll add this to my list of things to check for the Clear()
approach, thanks!
Lastly, based on my experience implementing cache invalidation in a cluster environment, it's fairly easy to get it working 99.9% of the time. However, reaching 100% reliability can be difficult, and the last 0.1% often leads to nasty bugs like persistent stale caches and the need for manual cache flushes.
Eh, tell me about it š You are absolutely right, and I've been there too: for example that's why with Auto-Recovery I tried to give a ready-made solution that would automatically cover 90% or more of the recovery scenarios, but I'm always open to new inputs.
BUt there's always more that can be done: if you have some experience there, some edge case to cover or any info at all please share them with me, it would be helpful to cover even more.
Oh, also: the public preview I'll release should also be good for that, so anyone can play with it and see how it works.
So, Iād recommend dedicating time to addressing edge cases like node restarts, unexpected shutdowns, and parallel operations on tags from multiple nodes.
On one hand: yes, totally. On the other hand: all the things you described are already handled by the plumbing in FusionCache (like fail-safe, soft timeout, cache stampede protection, auto-recovery, etc) and that is why the idea to build tagging on top of the existing features is so nice, I think.
Thanks!
Hi @aKzenT
One thing I was curious about while reading is your decision to implement your own strategy first, with the plan of leveraging .NET 9's new tagging support later.
No, not really: as explained here I plan to support the Microsoft HybridCache abstraction, not the implementation. This means that people will be able to use FusionCache "as an implementation of" HybridCache, but FusionCache will keep its own design and implementation separate.
You are right, I think what I should have asked is, what about the planned IDistributedCacheInvalidation interface that is proposed here: https://github.com/dotnet/aspnetcore/issues/55308 ? Will this be compatible with the design proposed here so that FusionCache can take advantage of it?
[...]
Another key point for me, in my own projects, is ensuring seamless support for named caches.
FYI: HybridCache from Microsoft wil not support multiple named caches, nor DI keyed services.
I know, I think it's a real bummer and something that should be there from the start.
[...] Now, to be even more precise, yes technically there will be a fixed "extra cost"... in the form of a
null
check to see if a cache entry has tags: no tags, no extra cost. But I think we can agree that a singlenull
check is basically free right?But as I understood, the Clear() method requires a "*" tag to be present. Even if that tag does not really exist on the wire, you would still need to check for the expiration of the "*" tag, don't you? Since you can't be sure if any node has called Clear(), you would need to make the Clear() functionality opt-in or pay for the price of quering for "*" tag expiration regularly, unless I misunderstood something.
[...] BUt there's always more that can be done: if you have some experience there, some edge case to cover or any info at all please share them with me, it would be helpful to cover even more.
I'm not sure that there is something specific that I could share with you. In our project we opted for another approach for cache invalidation. Basically we assumed that reading from the cache is a lot more frequent than writing, so we tried to optimize the reading path. In our approach, each time you write an entry to the cache, we add the key to a redis hash set with a specific key that represents this cache group (tag). When we want to invalidate the cache, we iterate through the list of keys and delete them one by one. Of course this requires us to use some redis commands beyond what IDistributedCache offers and it probably would not work together with all the other features of FusionCache, but it works well for us.
You are right, I think what I should have asked is, what about the planned IDistributedCacheInvalidation interface that is proposed here: dotnet/aspnetcore#55308 ? Will this be compatible with the design proposed here so that FusionCache can take advantage of it?
Ah, I see what you were thinking about, good point.
My idea about that part is to add support for the new abstractions, like IDistributedCacheInvalidation
or IBufferDistributedCache
, and when possible support them automatically.
In particular, IDistributedCacheInvalidation
is basically their version of the Backplane with the part about tags included: again I don't feel like the "single entry with all tag invalidation info" aproach is the best one, but apart from that I think I can support that interface too and, if the IDistributedCache
passed to FusionCache supports it, it may use that instead of the normal one.
Since IDistributedCacheInvalidation
comes from Microsoft, it will more probable that 3rd party IDistributedCache
implementers would implement it then FusionCache own IFusionCacheBackplane
, and that is why I'll add support for it: one thing to be sure about is if that will support the necessary core pieces. For example by not having support from the get go to multiple named caches, I don't know if it will be able to support notifications for different caches via different named Redis pub/sub channels (in the case of Redis).
Having said that, a nice thing is that this is not strictly necessary: if there's a benefit to it, good, otherwise users will simply have the feature with FusionCache as the other features already available, even when passing from the new HybridCache
abstraction.
In general though there are a lot of moving parts, and we'll have to wait and see, but the general approach I think is sound is this:
One thing to remember: binary compatibility betweeen HybridCache and FusionCache is not there, and not needed: when using FusionCache you are using FusionCache, the only thing to respect is the api public surface area. To be more clear: having on one node the Microsoft impl of HybridCache and on another node FusionCache and have them talk to each others is not something that will be supported, or that even makes sense (imho).
I know, I think it's a real bummer and something that should be there from the start.
Having been there, done that, I know it's a lot of work for them too, and everywhere there are time constraints, resources constraints, etc including at Microsoft, so I feel for them.
But as I understood, the Clear() method requires a "" tag to be present. Even if that tag does not really exist on the wire, you would still need to check for the expiration of the "" tag, don't you? Since you can't be sure if any node has called Clear(), you would need to make the Clear() functionality opt-in or pay for the price of quering for "*" tag expiration regularly, unless I misunderstood something.
Right, I now see what you meant.
Technically you are correct, but we are talking about a single cache entry, shared with the entire cache, so the cost of handling it I think is negligible. The cost of checking it will be in 99.999% of the cases a single in-memory lookup.
On top of that I'm thinking about some extra optimization for for it, so that is always immediately available even without a lookup.
Finally, as you mentioned, I can add an extra option in FusionCacheOptions
to disable support for it, and squeeze some extra perf if you really don't need it.
All needs to be measured, of course, but as it stands: what do you think?
I'm not sure that there is something specific that I could share with you. In our project we opted for another approach for cache invalidation. Basically we assumed that reading from the cache is a lot more frequent than writing
Agree, this is true 99% of the cases in the real world: in write-heavy systems where writes are way more frequent than reads caches are way less useful.
so we tried to optimize the reading path. In our approach, each time you write an entry to the cache, we add the key to a redis hash set with a specific key that represents this cache group (tag). When we want to invalidate the cache, we iterate through the list of keys and delete them one by one.
Makes sense, and that would've been the other approach, the one I called Server-Assisted: as said I will still play with it in the future.
Thanks again, this is a very useful conversation!
I don't have much feedback yet other than that this looks very interesting to help solve my usecase of invalidating all cache for a particular user š¤©
The Client-Assisted Tag Invalidation
approach should work well for us I think, we don't have any strong performance requirements yet that would require a server-assisted approach although I see the benefits of that too. I like the reasoning of providing both, where client-side works with all IDistributedCache
, and maybe plugins can add support for various server-side solutions?
I'm following this closely and am eager to test it š
Hi @aKzenT
But as I understood, the Clear() method requires a "*" tag to be present. Even if that tag does not really exist on the wire, you would still need to check for the expiration of the "*" tag, don't you? Since you can't be sure if any node has called Clear(), you would need to make the Clear() functionality opt-in or pay for the price of quering for "*" tag expiration regularly, unless I misunderstood something.
Update on this: currently on my experimental branch the Clear() support has been special cased as I planned, so right now it's just a very fast long > long
check š
Also, I added support for a real Clear() underneath, when FusionCache detects that it's physically possible to do that: basically when there's only L1 (no L2 or backplane) and total ownership of the inner MemoryCache
. By special casing that too it now don't even need to use the "*"
tag in those scenarios, which is a huge win.
Will update more in the next few days, and a preview version is right around the corner š¬
Hi @angularsen
The
Client-Assisted Tag Invalidation
approach should work well for us I think, we don't have any strong performance requirements yet that would require a server-assisted approach although I see the benefits of that too.
One thing to notice is that the performance impact would likely be there for the server-assisted version, too, just in a different way: in short it would equate to a massive UPDATE/DELETE FROM cache WHERE ...
which not a lot of caches can do, and even when they can - like with Redis + HSET - it would still not be cheap.
The nice advantage of the client-assisted approach is that it's automatically balanced between all nodes/caches, distributed over time, lazy (only when in fact needed) and self-cleaning.
I keep thinking about the details and behavior of such approach, and it may very well be the nicest, most balanced one all things considered.
Will post more of my considerations soon.
I'm following this closely and am eager to test it š
That's great: a preview version will be out soon, thanks!
Hi @jodydonetti,
This is excellent news as I have been wanting to use Fusion Cache for some time and this was considered a blocker based on how we currently utilise our in-house L1 (MemoryCache) + L2 (Redis) system.
Since we are using Client-Assisted invalidation, would it make sense to consider using expiry tokens in the local cache? Or would that cause too many complications with other FusionCache functionality like Fail-Safe?
In regards to Clear(), the proposed Client-Assisted approach makes the * essentially a filter on the cached data which is still held in L1.
Also, I added support for a real Clear() underneath, when FusionCache detects that it's physically possible to do that: basically when there's only L1 (no L2 or backplane) and total ownership of the inner MemoryCache. By special casing that too it now don't even need to use the "*" tag in those scenarios, which is a huge win.
What is the primary driver for the above mentioned special case? Is it that the expiry timestamp values are stored in the same cache as the actual data, and clear would also remove these?
As per my understanding, it would not be safe to remove the L1 instance of these timestamps as they are required to determine if the L2 values are marked as expired.
If that is the case, can the L1 expiry timestamps be stored in a separate location within IFusionCache (like a singleton, or an isolated MemoryCache). That way Clear may not need so much special handling and it could proactively trigger a .Clear() on all other nodes in the backplane when the * value timestamp is changed.
Regardless, of the above the use of L2 means that the * tag will always be needed as Clear functionality is not available, nor safe on a shared L2 cache.
-B
Hi @b-twis
This is excellent news as I have been wanting to use Fusion Cache for some time and this was considered a blocker based on how we currently utilise our in-house L1 (MemoryCache) + L2 (Redis) system.
That is nice to know š¬
Since we are using Client-Assisted invalidation, would it make sense to consider using expiry tokens in the local cache? Or would that cause too many complications with other FusionCache functionality like Fail-Safe?
I think that would be, as you guessed it, problematic. Not really for fail-safe, at least not at first, but mostly related to differences between L1 and L2, different internal update flows and so on. But I will give it a try nonetheless, will think about it.
In regards to Clear(), the proposed Client-Assisted approach makes the * essentially a filter on the cached data which is still held in L1.
Yes, but also not just that: it will act as a filter, yes, but also it will automatically clean up entries as they are discovered to be expired by a tag. This means that the system will automatically clean up as it is being used, which I think is a really nice additional bonus.
What is the primary driver for the above mentioned special case?
To effectively release memory when the scenario allows for that: normally, data would either expire after Duration
or be cleaned up one by one as stated above. This would do more, when possible.
Is it that the expiry timestamp values are stored in the same cache as the actual data, and clear would also remove these?
Eheh also that, you spotted it š¬
As per my understanding, it would not be safe to remove the L1 instance of these timestamps as they are required to determine if the L2 values are marked as expired.
Correct, and that's why I stated (look for the bold part):
Also, I added support for a real Clear() underneath, when FusionCache detects that it's physically possible to do that: basically when there's only L1 (no L2 or backplane) and total ownership of the inner MemoryCache. By special casing that too it now don't even need to use the "*" tag in those scenarios, which is a huge win.
If that is the case, can the L1 expiry timestamps be stored in a separate location within IFusionCache (like a singleton or an isolated MemoryCache).
I saw others are exploring the dictionary approach (I think HybridCache is doing this), but that means that the dictionary would grow forever until restarts, which is not good imho. And if you add an expiration to a dictionary you get... a separate cache. But a purely memory cache would also need to be maintained separately, and also store data in a (separate?) L2, and also notify evictions other nodes, and... and that is basically another FusionCache, which is why I picked the route of internal cache entries in the same cache, to basically share the same plumbing and configuration of the already used FusionCache.
Thoughts?
That way Clear may not need so much special handling and it could proactively trigger a .Clear() on all other nodes in the backplane when the * value timestamp is changed.
You should also consider the case of a shared memory cache as the L1: this is also used, and it's why I stated above "and total ownership of the inner MemoryCache", exactly to avoid problems in this case.
To give you an idea, for every feature I basically need to consider these possible scenarios:
Yeah, I know, the permutations of all possible scenarios is quite daunting š
Note that for people using an L2 but not a backplane, the solution is normally to have a low L1 Duration
and a higher L2 Duration
.
Regardless, of the above the use of L2 means that the * tag will always be needed as Clear functionality is not available, nor safe on a shared L2 cache.
I'm not sure I totally understood this last part, so I'll try with 2 different meanings:
Clear()
with an L2, then the answer as said is that I will not do that (limited to L1 only + other restrictions)long > long
check, the idea is that yes I would still need the special cache entry for "*"
tag, but only to be consistent and have an L2 fallback for cold starts (L1 emtpy) and in case of transient issues, but I am also updating the in-memory "clear timestamp" via backplane notifications immediately, and basically get the best of both worldsAgain, building on the existing plumbing and features would make this really a good way, imho.
Thoughts?
Thanks for sharing, this is really important for me to validate and fix my approach for Tagging!
Hi all, v2.0.0-preview-1 is out š„³ This includes Tagging and Clear() support!
š Please, if you can try it out and let me know what you think, how it feels to use it or anything else really: your contribution is essential, thanks!
Have been looking forward to this release for a while. I pulled it into our platform today to start experimenting. First, the interfaces all make sense and incorporating it into existing implementations was reasonably seamless. I am currently running into an issue where some of my objects do not seem to be persisting the tags into L2 (so likely are not in L1); this brings me to my recommendation. Is there any chance we can get the logging expanded to include the tags? That would help me try to diagnose my issue.
Again, just wanted to say thank you for putting this together.
Hi @jrlost first of all thank you for trying it out.
Have been looking forward to this release for a while. I pulled it into our platform today to start experimenting. First, the interfaces all make sense and incorporating it into existing implementations was reasonably seamless.
This is really good to know, I tried to make the design seamless with existing code, so it's good to know that.
I am currently running into an issue where some of my objects do not seem to be persisting the tags into L2 (so likely are not in L1); this brings me to my recommendation. Is there any chance we can get the logging expanded to include the tags? That would help me try to diagnose my issue.
I left this note in my code:
I think I have an answer then š¬
So yeah I'll add tags in the next preview.
Out of curiosity, and to help me test things out, which L2 and serializer are you using?
Again, just wanted to say thank you for putting this together.
Thank you again for trying it out!
System.text.json and redis.
On Mon, Nov 11, 2024, 5:37āÆPM Jody Donetti @.***> wrote:
Hi @jrlost https://github.com/jrlost first of all thank you for trying it out.
Have been looking forward to this release for a while. I pulled it into our platform today to start experimenting. First, the interfaces all make sense and incorporating it into existing implementations was reasonably seamless.
This is really good to know, I tried to make the design seamless with existing code, so it's good to know that.
I am currently running into an issue where some of my objects do not seem to be persisting the tags into L2 (so likely are not in L1); this brings me to my recommendation. Is there any chance we can get the logging expanded to include the tags? That would help me try to diagnose my issue.
I left this note in my code:
79E618C5-54F1-422C-A365-5DE1C56E062F.jpeg (view on web) https://github.com/user-attachments/assets/e2112fc9-8fdc-4010-9253-32d682a7c127
I think I have an answer then š¬
So yeah I'll add tags in the next preview.
Out of curiosity, and to help me test things out, which L2 and serializer are you using?
Again, just wanted to say thank you for putting this together.
Thank you again for trying it out!
ā Reply to this email directly, view it on GitHub https://github.com/ZiggyCreatures/FusionCache/issues/319#issuecomment-2469291157, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVCEN6VIXYMYSGW7EW5BJL2AE5VDAVCNFSM6AAAAABQH57X52VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRZGI4TCMJVG4 . You are receiving this because you were mentioned.Message ID: @.***>
System.text.json and redis.
Thanks, are you also using the backplane?
Yep
On Tue, Nov 12, 2024, 1:13āÆAM Jody Donetti @.***> wrote:
System.text.json and redis.
Thanks, are you also using the backplane?
ā Reply to this email directly, view it on GitHub https://github.com/ZiggyCreatures/FusionCache/issues/319#issuecomment-2469763404, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVCEN3Q46AUBJGQRQCCK5D2AGTAZAVCNFSM6AAAAABQH57X52VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRZG43DGNBQGQ . You are receiving this because you were mentioned.Message ID: @.***>
Hi @jrlost I just enabled tags logging locally and it's working well, will release a new preview version soon.
Meanwhile: are you able to come up with a MRE of ti not working as expected?
Thanks!
Awesome. Unfortunately I haven't had time to dig into it any further to understand why it works sometimes and not other times. After you push out the logging stuff, that should help me isolate what's special about it. From a high level, my best guess is that it seems to work with setAsync but not getOrSetAsync.
On Wed, Nov 13, 2024, 7:08āÆPM Jody Donetti @.***> wrote:
Hi @jrlost https://github.com/jrlost I just enabled tags logging locally and it's working well, will release a new preview version soon.
Meanwhile: are you able to come up with a MRE of ti not working as expected?
Thanks!
ā Reply to this email directly, view it on GitHub https://github.com/ZiggyCreatures/FusionCache/issues/319#issuecomment-2475141746, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVCEN77BHJ56PPDBD5XI5L2APZ2VAVCNFSM6AAAAABQH57X52VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZVGE2DCNZUGY . You are receiving this because you were mentioned.Message ID: @.***>
Awesome. Unfortunately I haven't had time to dig into it any further to understand why it works sometimes and not other times.
Ok this is already an indication, good to know.
After you push out the logging stuff, that should help me isolate what's special about it. From a high level, my best guess is that it seems to work with setAsync but not getOrSetAsync.
Another piece of info, good.
I'll try to look into it to see if I find something.
Thanks!
Hi @jrlost I just released preview-2.
Just set IncludeTagsInLogs
to true
in the options and do some tests.
Let me know, thanks!
I will pull it down tomorrow and take a look. Thank you for doing this.
On Thu, Nov 14, 2024, 3:34āÆPM Jody Donetti @.***> wrote:
Hi @jrlost https://github.com/jrlost I just released preview-2 https://github.com/ZiggyCreatures/FusionCache/releases/tag/v2.0.0-preview-2 .
Just set IncludeTagsInLogs to true in the options and do some tests.
Let me know, thanks!
ā Reply to this email directly, view it on GitHub https://github.com/ZiggyCreatures/FusionCache/issues/319#issuecomment-2477454789, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAVCEN4BMA43ACCTMBMNUYL2AUJPDAVCNFSM6AAAAABQH57X52VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINZXGQ2TINZYHE . You are receiving this because you were mentioned.Message ID: @.***>
Is it possible to update one or more tags of an existing cached Item?
Is it possible to update one or more tags of an existing cached Item?
Hi @angelofb , partial updates of a cache entry's data are not supported. Think of it like this: every cache entry, which is the combination of value + tags + metadata (like expiration, etc), can only be updated atomically.
To update tags for a cache entry you need to do a SET-like operation (eg: Set/GetOrSet) and overwrite it all, since this will allow FusionCache to do its things like events, backplane notifications, etc...
Any use case in particular you'd like to share?
thank you, I don't have an use case, I was just wondering. great job, so far no problem with 2.0.0-preview-2.
@jodydonetti , I pulled preview-2 down and tried it out, thanks again BTW. I can confirm that all entries where a tag was added via GetOrSetAsync
result in [T=]
in the logs; which aligns with what I was seeing in Redis. All entries where a tag was added via SetAsync
contain a value in the logs as well as in L2.
Awesome!
I'm now adding specific SkipMemoryCacheRead
, SkipMemoryCacheWrite
, SkipDistributedCacheRead
and SkipDistributedCacheWrite
as discussed here since they have been asked, and also to be aligned with HybridCache which will be released soon.
Also, talking about HybridCache, I'm working on the compatible version which is coming along very nicely.
Damn, I also need to create specific issues to track those activities š„²
Anyway will update soon.
Hi all, I just published a dedicated issue for the Clear() feature.
It contains some details about the mechanics behind it, the design of it, performance considerations and more.
The Need
Time and time again, there have been requests from the community to support some form of "grouping together cache entries", primarily for multi-invalidation purposes.
Here are some examples:
On top of this, the upcoming HybridCache from Microsoft that will tentatively be released at around the .net 9 timeframe and for which FusionCache want to be an available implementation of (and actually the first one!), will seemingly support tagging.
So, it seems the time has come to finally deal with this monumental beast.
Scenario
As we all know, cache invalidation is in general an uber complex beast to approach, and this is true even with "just" an L1 cache (only the first local level, in memory).
Add to this the fact that when we talk about an hybrid cache like FusionCache, we can have 2 levels (L1 + L2, memory + distributed) and multi-node invalidation (see: horizontal scalability) and it's even worse.
Finally, as a cherry on top, in the case of FusionCache add the fact that it automatically handles transient errors thanks to features like fail-safe, soft timeouts, auto-recovery and more, and you have a seemingly insurmountable task ahead.
Or is it?
Limitations
Aside from the complexity of the problem itself, which as said is already kind of crazy hard, we need to deal first and foremost with a design decision that sits at the foundation of FusionCache itself since the beginning (and the same is also true for the upcoming HybridCache from Microsoft): the available abstractions to work with, for both L1 but even more so for L2, are quite limited. In particular for L2 that means
IDistributedCache
, with its very limited set of available functionalities.The design decision of using
IDistributedCache
paid a lot of dividends along the years, because any implementation ofIDistributedCache
is automatically usable with FusionCache: since there are a lot of them readily available covering very different needs, this is a very powerful characteristic to have available.On the other hand, as said, we basically have at our disposal only 3 methods:
That's it, and it's not a lot to work with.
So, what can we do?
R&D Experiments
Since as we can see there's no native support for tagging, along the years I've experimented multiple times with an approach which I called "Client-Assisted Tag Invalidation" which basically means "do something client-side only, with no server-side support and that for all intents and purposes would get us the same result from the outside".
This, in turn, translates to not actually do a real "evict by tag" on the underlying caches (eg: on Redis, Memcached, etc) but instead keep track of "something" only on the client-side to check before returning data to callers. This would logically work as a sort of "barrier" or "low-pass filter" to "hide" data that is logically expired because of one or more of the associated tags.
There are different ways to try to achieve this, but in general it would have consisted of something like:
So, "removing by tag" then means getting the special entry, add the new bit of information, and saving it back.
But, as I explained in my comment regarding a similar approach for the upcoming HybridCache from Microsoft, this can have severe limitations and practical problems, like:
IDistributedCache
, it's not possible to concurrently add 2 different pieces of information to the same cache entry at the same time. This typically results in a last-wins effect, basically cancelling the previous ones done in the same "update time-window". Even accounting for some special data structure like hashsets on Redis (on server-side cache backends that support such a thing), concurrency would be theoretically solved but the first point (size) would still remainAll of this is why, after multiple experimentations along the years, I was basically convinced that the only way to add proper tagging support would've been to go with a "Server-Assisted Tag Invalidation" approach, meaning creating a new abstraction like
IFusionCacheLevel
or something (either an interface or an abstract class, it's not the point here) to model a generic and more powerful "cache level", with native support for tagging and more.This would simplify a lot of things but, at the same time, would take away the ability to use any existing
IDistributedCache
implementation out there. FusionCache though already works with vanillaIDistributedCache
and this should not go away, so it means FusionCache must be able to work with both vanilla and extended at the same time: this would not be a problem per se, since I can check at runtime which abstraction the L2 implements and act accordingly, but it also means that for users NOT using an extended L2 implementation, extra features like tagging would NOT be available.And I don't like this.
And I would really really like to give tagging to all FusionCache users, all of them.
Epiphany
Recently I went to South Korea for my (very late) summer vacations.
In Seoul there's a good jazz scene, with multiple places that deserve a visit like Libro for some live performances which is really beautiful or the nice and cozy Coltrane for some vinyl listening, both highly recommended.
One evening, while drinking a glass of Ardbeg at Coltrane, Land of Make Believe by Chuck Mangione started playing.
And I suddenly had an epiphany.
Why not look at it from a different angle, get to a delicate balance between absolute performance and features available, think about how it would actually be used in the real world from a statistical perspective, and "simply" use the pieces already there to find an overall equilibrium?
By not using a single cache entry to store all tag invalidation infos we would be able to guarantee scalability with whatever number of tags, virtually without limits.
Solution
I'm proposing a solution I call "Client-Assisted Tag Invalidation", meaning it does not requires extra assistance from the server-side.
On one hand it's true that by looking at an entire system in production we'll probably have a lot of tag invalidations along time, and this is a given.
On the other hand it's also true that, by their own nature, a lot of tags will be shared between cache entries: this is the whole point of it anyway.
On top of this, we can set some reasonable limits: for example when working with metrics in OTEL systems, it is a known best practice to not have a huge amount of tags and to not have tags with a huge amount of different values (known as "high cardinality"). So we can say the same here.
By accepting this small fact, by understanding the probabilistic nature of tags usage and sharing and by most importantly relying on all the existing plumbing that FusionCache already provides (like L1+L2 support, fail-safe, non-blocking background distributed operations, auto-recovery, etc) we can "simply" say that, for each tag, we'll automatically handle a cache entry with the data needed, basically meaning the timestamp of when the expiration has been requested the last time for that tag.
Regarding the probabilistic nature: basically a lot of tags will be shared between multiple cache entries, think the Birthday Paradox.
So, a
RemoveByTag("tag123")
would simply set internally an entry with a key like"__fc:t:tag123"
or something like that, containing the current timestamp. Also note that the concrete cache key will also consider any potential cache-key prefix, so mutliple named caches on shared cache backends would automatically be supported, too.Then when getting a cache entry, after getting it from L1/L2 but before returning it to the outside world, FusionCache would see if it has tags attached to it and, in that case and only in thase case (so no extra costs when not used), it would get the expiration timestamp for each tag to see if it's expired and when.
For each related tag, if an expiration timestamp is present and that is greater than the timestamp ai which the cache entry has been created, it then should be considered expired.
Regarding the
Duration
of such special entries with tag expiration data, a value would be configurable via options but a sensible default (like24h
) would be provided that would cover most cases.This can be considered a "passive" approach (waiting for each read to see if it's expired) instead of an "active" one (actually go and massively expire data immediately everywhere).
When get-only methods (eg:
TryGet
,GetOrDefault
) are called and a cache entry is found to be expired because of tags, it not only hide it from the outside but FusionCache will effectively expire it which, thanks to FusionCache normal behaviour, means both locally in the L1, on L2 and on each other node's L1 remotely (thanks to the backplane).When get-set methods (eg:
GetOrSet
) is called and a cache entry is found to be expired because of tags, it just skip it internally and call the factory, since that would produce a new value and resolve the problem anyway, just in a different way: the internal set will again automatically save the new value locally in the L1, on L2 and on each other node's L1 remotely (thanks again to the backplane).So the system would automatically updates internally based on actual usage, only if and when needed, without massive updates to be made when expiring by a tag.
Nice.
What about app restarts? No big deal, since everything is based on the common plumbing of FusionCache, all will work normally and tag-eviction data will get re-populated again automatically, lazily and based on only the effective usage.
Performance considerations
But wait, this is probably ringing a bell for a lot of people reding this: isn't this a variation of the dreaded "SELECT N+1 problem"?
No, at least realistically that is not the case, mostly because of probabilistic theory and adaptive loading based on concrete usage.
Let me explain.
A typical SELECT N+1 problem happens when, to get a piece of data, we do a first select that returns N elements and then, for each element, we do an additional SELECT.
Here this does not happen, because:
As an example if we are loading, either concurrently or one after the other, these cache entries:
"foo"
, tagged"tag1"
and"tag2"
"bar"
, tagged"tag2"
and"tag3"
"baz"
, tagged"tag1"
and"tag3"
The expiration data for "tag1" will be loaded lazily (only when needed) and only once, and automatically shared between the processing of cache entries for both "foo" and "baz". And since as said tags are frequently shared between different cache entries, this means that the system will automatically load only what's needed, when it's needed, and only once.
Some extra reads would be needed, yes, but deinitely not the SELECT N+1 case which would only remain as a worst case scenario, and not for every single cache read.
What about needing tag expiration for "tag1" by 2 difference cache entries at the same time? Will it be loaded multiple times? Nope, we are covered, thanks to the Cache Stampede protection.
What about tag expiration data being propagated to other ones? We are covered, thanks to the Backplane.
And what if tags are based on the data returned from the factory, so that it is not known upfront? No worries, Adaptive Caching will be extended to support tagging, too.
What about potential transient errors? We are covered, thanks to Fail-Safe.
What about slow distributed operations? Again we are covered, thanks to advanced Timeouts and Background Distributed Operations.
What about recovering from distributed errors? Should users need to handle them manually? Nope, also covered, thanks to Auto-Recovery.
All of this because of the solid foundations that have been built in FusionCache for years šŖ
What about Clear() ?
If all of this works out, and up until now it seems so, this same approach may also be used to finally implement something else: a proper
Clear()
method, one that actually supports all scenarios:But how?
By simply adding support for a special
"*"
tag (star, meaning "all") we can achieve that.This tag can also receive a special treatment, like being immediately read from L2 when an update notification is received, for performance reasons.
Server-Assisted Tag Invalidation?
Does this approach exclude an hypothetical "Server-Assisted Tag Invalidation" with an extended
IFusionCacheLevel
or similar?No, actually not! But supporting tagging without that means that the feature can be available right now, for any existing
IDistributedCache
implementation, without requiring any extra assistance from 3rd party packages, and with maybe a couple of extra reads here and there.In the future though I think I will also explore the server-assisted route, because it can lead to a good perf boost: the nice thing about doing the client-assisted approach first though is that the feature will be available in both ways, and when using the eventual extended abstraction you'll "just" get an extra perf boost, but in both cases no limitations at all.
I think this is the best approach overall.
Where are we now?
Right now I have an implementation working on a local branch, which is already something damn awesome to be honest.
I'm currently in the process of fine tuning it, benchmarking it, test edge cases, trace/log the hell out of it to also see the extra work required while simulating real-world scenarios and so on.
If all goes well this feature will be included in FusionCache v2.0, which would be released at around the same time as .NET 9 , including support for the new HybridCache from Microsoft.
Your help is needed
But, honestly, it still seems too good to be true, and I may be missing something.
It would be a really invaluable thing for me to have, and I thank you for that in advance.
Thanks š