ZiggyCreatures / FusionCache

FusionCache is an easy to use, fast and robust hybrid cache with advanced resiliency features.
MIT License
1.71k stars 90 forks source link

[BUG] Chaining call of RemoveAsync doesn't work #261

Closed apavelm closed 3 months ago

apavelm commented 3 months ago

Just to simplify the explanation, I'll describe in some abstraction.

There are models: Item and List. Item is an object with some Id. Therefore something like:

{
  "id": 1,
  "name": "some value"
}

List is a collection of item IDs. So, it it like:

[1,2,3,4]

We are caching both, List entirely and Item by Id.

There are methods (simplifying):

GetList retrieves the data from DB and caching it using GetOrSetAsync with "ListCacheKey" and returns List GetById retrieves the data of single Item by id using GetOrSetAsync with "Item-{id}" as a cache key CreateNew creates a record in DB and removes "ListCacheKey" from cache. UpdateById updates a record in DB and removes "Item-{id}" from cache. DeleteById removes a record in DB and removes both cache records "Item-{id}" and "ListCacheKey" from cache. NOTE: issue is here

Also, Redis is used as a backplane and all sync operations are async (this is important I suppose).

Issue description: The chain of calling to reproduce: GetList -> CreateNew -> GetList -> DeleteById -> GetList. At the last call of GetList -> all items, including the deleted one are returned

inside DeleteById after removing record from the DB we calling:

await _cache.RemoveAsync("ListCacheKey").ConfigureAwait(false);
await _cache.RemoveAsync($"Item-{id}").ConfigureAwait(false);

I used debugger to ensure about the issue, and I was able to confirm this. Keys matches. So, it looks like not a stupid typo or something like that. I suspect async operations, but I would prefer not to to re-enable it, if possible.

@jodydonetti Please advice how to fix this. Thank you in advance.

henriqueholtz commented 3 months ago

@apavelm Why it was closed without any update? Could you explain please?!

apavelm commented 3 months ago

@henriqueholtz I'm not sure what it was, perhaps something abnormal. I'm still trying to understand, but at some point it suddenly started working as supposed.

jodydonetti commented 3 months ago

Hi @apavelm and thanks for using FusionCache!

This is realy strange, and even more so that it started working after some time, would like to investigate more...

A couple of questions:

What I think may have happened (but without more info I'm just spitballing here) is that there has been some transient problems related to your Redis instance, maybe a temporary slowdown or something, which in turn may have created a temporary out-of-sync issue.

For example:

In this case, even if everything went fine, check for the options AllowBackgroundDistributedCacheOperations and AllowBackgroundBackplaneOperations. If one or both of them are true it can happens what is described above, meaning a backplane notification can take a little more time to reach N2 and a GetAll on N2 has been already executed.

Another example:

Something to keep in mind in this case is that luckily Auto-Recovery automatically handles this (and more), and will retry later to make everything in-sync again. Having said that, it takes a little time to do that, so an immediate GetAll on N2 will not see the update immediately.

If instead you are working with one node or anyway are sure all the calls ended up on the very same node, then... well, this is something I never ever observed 🤔

Hope this helps, let me know!