ZiggyCreatures / FusionCache

FusionCache is an easy to use, fast and robust hybrid cache with advanced resiliency features.
MIT License
1.56k stars 84 forks source link

[BUG] Possible bug. FusionCache + EntityFramework (latest) on ASP NET Core 8 (.NET 8) #236

Closed apavelm closed 2 months ago

apavelm commented 2 months ago

Hi, I'm using the following code as suggested in the documentation, database is Azure SQL, FusionCache configuration provided here: #210

var data = await _cache.GetOrSetAsync("someKey-"+someId, async (cancellationToken) =>
    {
        var records = await dbContext.Table
            .AsNoTracking()
            .Where(x => x.Id == someId)
            .FirstOrDefaultAsync(cancellationToken: cancellationToken)
            .ConfigureAwait(false);
        if (records== null)
        {
            throw new RecordNotFoundException();
        }
        return records ;
    })
    .ConfigureAwait(false);

And sometimes, I'm getting strange exception:

System.InvalidOperationException: A second operation was started on this context instance before a previous operation completed. This is usually caused by different threads concurrently using the same instance of DbContext. For more information on how to avoid threading issues with DbContext, see https://go.microsoft.com/fwlink/?linkid=2097913.

   at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.AsyncEnumerator.MoveNextAsync()

I tried to change DbContext lifetime to Transient, but it didn't help. It does not happen always. Did you face with similar?

apavelm commented 2 months ago

I'm not sure why another operation was started before 1st ended as each operation is awaited

jodydonetti commented 2 months ago

Hi @apavelm , did you have more actions enable soft/hard timeouts or Eager Refresh ? If that is the case, the DbContext may still be in use after the await _cache.GetOrSetAsync call (because it wil be finishing running in the background).

akoken commented 2 months ago

@apavelm This is most likely EF Core DbContext lifetime issue.

apavelm commented 2 months ago

@akoken I thought so, but I made EF DbContext as Transient to avoid this. It didn't help @jodydonetti Yes, some cache "partitions" have such settings.

Do you think that requesting individual DbContext using IServiceProvider inside GetOrSetAsync block could solve the issue?

UPDATED did anyone tried an approach to use DbContextFactory instead of Dbcontext in DI? I'm only concerned about its disposal

jodydonetti commented 2 months ago

Hi, are you able to debug this? I'd like some more details, like the call stack when the exception is thrown, for example.

jodydonetti commented 2 months ago

Oh, also: which version are you using?

apavelm commented 2 months ago

The latest one. I switched this morning to IDbContextFactory and now creating DbContext inside GetOrSetAsync block - so far so good. Will monitor more

jodydonetti commented 2 months ago

Good! Let me know how it goes.

apavelm commented 2 months ago

@jodydonetti It has been 4 days already passed, and still no such errors. So, I suppose IDbContextFactory should be recommended in documentation. Since .NET 7/8 it could be easily added to DI together with DbContext. So, both could be used at the same time. factory - for cache-inside blocks wrapped into "using", and DbContext for normal operations.

jodydonetti commented 2 months ago

Awesome, happy it worked!

And good call for the documentation, I already planned to add 2 new docs:

Finally I'd link the latter in the docs for both timeouts and eager refresh.

Sounds good?

apavelm commented 2 months ago

@jodydonetti Hi, I just want to warn you about one thing. Perhaps what I faced is related to "Background Factory Executions". IDbContextFactory works fine, and no issue with this. But we caught a very tricky error. Sometimes application crashes without any exception. We partially addressed this issue, but not in full. This could be invisible or ignored for simple applications, and 95% of users will never face it.

So, when background executions are started/running in some cases, it could cause synchronization context related issues. And sometimes could cause application crashes. Like in our case, we have unsafe binary dependency (Google OR Tools) for complex math computations. And when the control thread is passed "from" or "to" (hard to identify the direction) issues could occur. Depends on where the control thread on the moment of execution, it could cause exception (for managed code), or application crash for unsafe (DLL) code. Initially, I was blaming Google's library, but even using old version of app it could happen randomly.

I don't know what to do with this. Putting await Task.Yield(); before unmanaged code execution somehow helped, but not in 100% cases. So, please think about the possibility of described case. Unfortunately, I have no repro-code as too-many async thread situation is difficult to even debug. I kindly ask you just to review the code, maybe something that looks like described could be found there.

And a side question, is it possible to use only InMemoryCache and Redis only as a backplate for syncing between instances? When I tried to configure it like this - I got a warning. Sorry for off-topic.

Thank you.

apavelm commented 2 months ago

@jodydonetti one more question, does the FusionCache on Redis uses transactions or any other blocks?

jodydonetti commented 2 months ago

@jodydonetti Hi, I just want to warn you about one thing. [...]

Interesting, thanks for the heads up, I'll try to take a bird's eye view at the code from looking for something that may get these results, and I'll warn you if I'll find something.

And a side question, is it possible to use only InMemoryCache and Redis only as a backplate for syncing between instances?

Is it possible? Yes. Would I suggest it? No 😬

I mean it should work, but things can go wrong in subtle ways, so you better be careful. Read here for more.

When I tried to configure it like this - I got a warning.

Which warning was it? I suppose it was this:

it has been detected a situation where there *IS* a backplane, there is *NOT* a distributed cache
and the DefaultEntryOptions.SkipBackplaneNotifications option is set to false. This will probably
cause problems, since a notification will be sent automatically at every change in the cache (Set,
Remove, Expire and also GetOrSet when the factory is called) but there is not a distributed cache
that different nodes can use, basically resulting in a situation where the cache will keep invalidating
itself at every change. It is suggested to either (1) add a distributed cache or (2) change the
DefaultEntryOptions.SkipBackplaneNotifications to true.

If this is the case then yes, read the warning carefully, otherwise as explained bad things may happen...

jodydonetti commented 2 months ago

@jodydonetti one more question, does the FusionCache on Redis uses transactions or any other blocks?

I don't understand what you mean here, cab you giveme more context?

Anyway each operation towards Redis is done "normally", without any "outside" transaction.

Maybe you are referring to the native transaction support in Redis itself?

To answer that, think about the fact that FusionCache doesn't even know what Redis is, since it works with any IDistributedCache implementation, one of which happens to talk to Redis.

When using Redis the library typically used is StackExchange.Redis, so if that library uses Redis transactions or not is up to its authors (spoiler: no, it doesn't currently uses it, and I don't think it ever will).

Hope this helps.

apavelm commented 2 months ago

no, it doesn't currently uses it, and I don't think it ever will That was the answer. Thanks

Just to clarify the situation about Redis. We ran into a strange issue with Redis, I just wanted to make sure the way how it used inside FusionCache. Thank you

About the warning - yes, I meant that one. I was investigating the issue I wrote above, so it was just for testing purposes. Although, Perhaps there is a room for improvement. Let's say we have a simple endpoint that uses FusionCache GetOrSet. When Distributed cache disabled - it completes in 20-50ms, when it is enabled, execution takes significantly much more time, regardless if cache record is valid. Just a suggestion, maybe it is not correct from any point of view, but what if L2 operation that could be done separately do as a background process? I understand that it is probably not straightforward to implement, but just an idea.

OK, Thank you for your answers and the library

jodydonetti commented 2 months ago

Just a suggestion, maybe it is not correct from any point of view, but what if L2 operation that could be done separately do as a background process? I understand that it is probably not straightforward to implement, but just an idea.

You already can, just set AllowBackgroundDistributedCacheOperations to true and you're done. Also, any potential issue with the distributed cache will be automatically handled by Auto-Recovery.

Hope this helps.