dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.78k stars 3.19k forks source link

Tracking error on Update with two Many-To-One properties referencing the same entity #28666

Closed alaingaudreau closed 2 years ago

alaingaudreau commented 2 years ago

Hello,

I'm seeing odd behavior EF Core 6.0.8 (and 6.0.7) where we have an object that has two properties referencing another entity within a SignalR Hub connecting to a MariaDB database using the pomelo provider.

We are porting one of our backends to EF Core from java and hibernate and admit our team having limited experience with EF Core and this may be our ignorance on the inner workings of EF Core.

Tax {
   Guid Id;
   Account PurchaseTaxAccount;
   Account ExpenseTaxAccount;
   ...
}

modelBuilder.Entity<Tax>().HasOne<Account>(p => p.PurchaseTaxAccount);
modelBuilder.Entity<Tax>().HasOne<Account>(p => p.ExpenseTaxAccount);
modelBuilder.Entity<Tax>().Navigation(e => e.PurchaseTaxAccount).AutoInclude().UsePropertyAccessMode(PropertyAccessMode.Property).HasField("_purchasetaxaccount");
modelBuilder.Entity<Tax>().Navigation(e => e.ExpenseTaxAccount).AutoInclude().UsePropertyAccessMode(PropertyAccessMode.Property).HasField("_expensetaxaccount");

When we dbContext.Taxes.Update(item) where item.PurchaseTaxAccount and item.ExpenseTaxAccount are set to the same Account we get an InvalidOperationException.

System.InvalidOperationException: The instance of entity type 'Account' cannot be tracked because another instance with the key value '{Id: d270f6ad-bc2d-45df-b428-407992e07fc1}' is already being tracked. When attaching existing entities, ensure that only one entity instance with a given key value is attached.

both get's toList() have AsNoTracking() dbContext.ChangeTracker.AutoDetectChangesEnabled = false and UseQueryTrackingBehavior(QueryTrackingBehavior.NoTracking) in the AddDbContext()

when I break on the Update() I can confirm that NoTracking and AutoDetectChangesEnabled are set.

If we use two different Account's it runs fine.

It was my understanding that AsNoTracking() and AutoDetechChangedEnabled set to false should disable tracking and allow us to avoid any tracking exceptions?

While trying to debug I tried adding some code to set each item as Detached and clear the ChangeTracker as below:

using (var scope = _sp.CreateScope())
            {
                var dbContext = scope.ServiceProvider.GetRequiredService<DataContext>();
                dbContext.ChangeTracker.AutoDetectChangesEnabled = false;

                if (dbContext.Taxes != null)
                {
                    dbContext.Entry(item).State = EntityState.Detached;
                    dbContext.Entry(item.ExpenseTaxAccount).State = EntityState.Detached;
                    dbContext.Entry(item.PurchaseTaxAccount).State = EntityState.Detached;
                    dbContext.ChangeTracker.Clear();

                    var result = dbContext.Taxes.Update(item);
                    dbContext.SaveChangesAsync();
               }
           }

Which did not help since it appears to be deeper into the Update() method call stack that NoTracking and AutoDetectChangesEnabled are being ignored.

Granted we could just forbid assigning the same Account but this is quite a large project and we use this pattern quite extensively in the accounting/invoicing/quoting module with taxes so if it is a bug be it from our misuse of EF Core or an internal issue, we will run in to it quite often and need to get it working in any case.

ajcvickers commented 2 years ago

@alaingaudreau EF Core SaveChanges works by sending updates for tracked entities. See Change Tracking in EF Core. Using AsNoTracking can be useful when the entities returned are not going to ever be tracked by that context instance, but as soon as the database needs to be updated, then the entities need to be tracked. The Change Tracking and Save data docs discuss various patterns that can be used here, depending on the type of application and its requirements.

alaingaudreau commented 2 years ago

@ajcvickers I had a look through the docs for change tracking and save data from that link and in disconnected entities I see the tip and identity resolution paragraph that mentions only a single instance of any entity with a given PK can be tracked which I had seen from another source during investigation but had read conflicting information about SaveChanges() bypassing the change detection when AutoDetectChangesEnabled was set to false.

From what I gather, it is a limitation on EF Core that cannot be circumvented making it impossible to persist entities that may reference another entity twice in the same graph/transaction.

Which would also make it impossible to have something like Order.OrderLines[n].TaxCode or twice the same service/product in the same order as different lines and persist it in a single graph as the same referenced entities could theoretically appear more than once and trigger the ChangeTracker limitation.

ajcvickers commented 2 years ago

as the same referenced entities could theoretically appear more than once and trigger the ChangeTracker limitation.

The best way to handle this is to not create a graph with duplicated instances. That is, do identity resolution when building the graph, or use a tool that does so, such as a JSON serializer with appropriate reference tracking turned on.

An alternative in EF7 is to resolve the conflicts when tracking the graph in EF, but then you need to make sure to resolve any inconsistencies between the instances and their references in the graph.

alaingaudreau commented 2 years ago

An alternative in EF7 is to resolve the conflicts when tracking the graph in EF, but then you need to make sure to resolve any inconsistencies between the instances and their references in the graph.

@ajcvickers That seems like a possibility but I see it being costly/messy when it could be handled in a different manner which would also address some other posts/replies on the subject.

If we take my above example as a reference point, these foreign entities are essentially read only to save db hits and network latency if the client app needs the attached information and is returned to the server when updating it or processing it server-side (server side printing, pdf creation and signing, emailing, report creation and other features that need to be protected from reverse engineering and kept server side, etc...)

If we had a way to set the relationship as a simple reference with for example a setter IsReferenceOnly(true) on the NavigationBuilder or ReferenceNavigationBuilder with fluent to essentially have EF Core ignore the foreign entity changes and it's childrens as well and only set the parent entity's FK field (Tax.PurchaseTaxAccount and Tax.ExpenseTaxAccount) .

It would result in an exponential reduction in computational resources and db hits to achieve the desired behavior on update's while still allowing us to autoinclude those foreign entities on queries.

Keeping in mind we are talking of a client/server application with thousands of simultaneous users scattered across the globe on various device types and connectivity, every cpu/db/network latency hit needs to be considered and weighed.

I haven't dug into the EF Core code but in theory, it shouldn't require all that much code to implement as I'm guessing we would only need to skip the change detection and validation methods if that reference only flag is set, break the loop used to walk the entity tree and simply set the reference key field on that member.

Don't get me wrong, change tracking can be incredibly useful but in cases such as this with ef core's change tracking implementation, it also causes problems and performance loss by requiring much more preprocessing.

Even if we ended up disabling change tracking entirely for that scope and thus sending a complete update query to the server for members that have not changed within this entity, it would still be more efficient on such a small object in such a small scope than detecting changes and doing identity resolution on "read only" references or splitting the object/graph into many different pieces to get around this limitation would it not?

If we had for example a quote or order or invoice with 100+ lines of products with a reference for the product record (with it's own references for product type etc...), a reference to the ledger account associated with that line, the tax associated with that line, various other reference as well and each of those 100+ lines may be of the same product at different price levels varying on quantity using the same tax group, ledger account, etc... the complexity of the graph becomes quite impressive but in reality, if you prune the references away to simply their key by not processing them in the change tracker, it's quite fast and compact and result in a small update Order query, a drop on OrderLine where orderid = Order.id and a multi-row insert for OrderLine.

No need to process all those hundreds of referenced entities at all in the persistence layer which is much more efficient at not performing some steps based on a flag than any method of preprocessing to clean up and perform identity resolution on the object before passing it off to the DbSet<>where it will always go through all the change tracking and identity resolution overhead on top of that.

ajcvickers commented 2 years ago

@alaingaudreau Thanks for your comments, but this is not something we plan to implement.