dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.79k stars 3.19k forks source link

Consider taking into account collection order in topological sort when entities with generated keys are inserted #15585

Open divega opened 5 years ago

divega commented 5 years ago

This is based on a dicussion with a customer that is porting a large codebase from LINQ to SQL to EF Core. In the customer's own words:

  • There are a number of tables that currently rely on inserting records in the order of the principal's collection navigation property (which was the LINQ to SQL behaviour).
  • In the medium term these tables should probably be modified to include an OrderIndex column or similar, in which case https://github.com/aspnet/EntityFrameworkCore/issues/9067 may be of interest to us.
  • In the short term we've worked around this by special casing some of the tables and forcing records to be inserted one at a time, in the expected order.
  • We're considering "genericising" this workaround, such that we mimic the LINQ to SQL insert order. Obviously this gets more complex when taking object dependencies into account, and we'd be interested in hearing how we might be able to take advantage of the EF Core infrastructure to support this.
  • We'd also be interested in knowing whether it'd be possible to make the EF ordering a behaviour that we could specify. e.g. EF Core needs to determine the correct dependency ordering, but after that, the insert order can be arbitrary. Would it be possible for us to contribute our input into that "arbitrariness"? I'm thinking effectively a Func<IEnumerable<object>, Func<IEnumerable<object>>, such that we can re-order as we see fit.

I discussed this with @AndriySvyryd and although currently the secondary order used is the primary key of the element of the collection, for the insert scenario in which the actual primary key hasn't been generated yet, we could use the collection order instead. That said, at the stage in the update pipeline where we are performing the topological sort, all we have is entity entries and we currently don't have access to the collection, so we it would require some refactoring that makes the feature more expensive.

The main argument for not doing this is that after the initial insertion there is no way to maintain this order, and reliance on the order can easily become a pit of failure. The proper way to achieve this would be to actually implement support for sorted collections as describe in #9067. However implementing this behavior should still be less costly than that.

If we convince ourselves that this is a good behavior to add, we should also discuss if we could implement another behavior that has been asked for in issues like https://github.com/aspnet/EntityFrameworkCore/issues/11686. That one is not about taking into account collection order, but the order in which entities were added to the DbContext for inserts with generated keys. It is possible that the implementaiton could be similar.

A couple of alternative implementatoins we discussed:

  1. Add a replaceable service to obtain the secondary order for a given entry. An implemetnation of this service could look up elements in the collection to use their position.
  2. Somehow preserve the order in which entires got created on the DbContext (and make sure when we scan collections looking for new entities we add them in order too) and then use that as a fallback secondary order. This could be implemented by either adding a serial number to entries o storing them in ordered collections and usign order preserving sorting algorithms. Note that if you change the order of elements in the collection before you save, you would still get the entities in the order in which they were originally observed.
jzabroski commented 5 years ago

@divega Interesting - if I understand this correctly, the customer wants to control the order in which items are added to the table, e.g. such that Id is monotonically increasing with say a TimesheetDate.

One thing I would do in EF Classic was to do an Interlocked.Increment on the Id column prior to SaveChanges, usually in the default constructor. I can't remember if this was exactly the same issue and why I did it this way, since this was 6 years ago, but worth mentioning. I recall in my case I would get weird errors about having multiple values in the collection with the same key.

langdonx commented 5 years ago

In trying to upgrade to 3.0, this issue has tripped up a number of our integration tests, where we insert a few records in a transaction (some good, some bad), and query them back out to verify our expectations.

The way it works is certainly unexpected -- it seems to consistently insert in reverse order for me.

The quick work around for us is bullet #3 above -- calling SaveChanges after each .Add().