Closed JonPSmith closed 2 years ago
Duplicate of #20578
It is fixed in nightly build already and will be released in new preview.
As a work-around, you can put Skip(0)
after OrderBy and it should work.
I have just updated to release 5.0.0-preview.4.20220.10 of EF Core and the code in the initial issue now runs, but fails to order the data.
However, if I change the code to a disconnected state it works.
Below is the code that works
[Fact]
public void TestIncludeSortReviewsDisconnected()
{
//SETUP
var options = SqliteInMemory.CreateOptions<EfCoreContext>();
using (var context = new EfCoreContext(options))
{
context.Database.EnsureCreated();
var newBook = new Book
{
Reviews = new List<Review>
{
new Review {NumStars = 2}, new Review {NumStars = 1}
}
};
context.Add(newBook);
context.SaveChanges();
}
using (var context = new EfCoreContext(options))
{
//ATTEMPT
var query = context.Books
.Include(x => x.Reviews.OrderBy(y => y.NumStars));
var books = query.ToList();
//VERIFY
_output.WriteLine(query.ToQueryString());
books.Single().Reviews.Select(x => x.NumStars).ShouldEqual(new[] { 1, 2 });
}
}
EF Core version: Database provider: Microsoft.EntityFrameworkCore.Sqlite Target framework: 5.0.0-preview.4.20220.10 Operating system: Windows 10 IDE: Visual Studio 2019 16.5.5
We need to investigate why the SQL generated is different in both cases.
@JonPSmith - The behavior you are seeing is somewhat expected. 2 reasons behind it
@smitpatel . Thank you for your quick and detailed response. I have a question on your second bullet. The type of the Reviews
navigational property is ICollection<Review>
, which is ordered. I had assumed that assigning to that would retain the order. If it doesn't then I think I would have had more problems on other sorting issues than just this one. Can you explain - thanks.
Based on this code, I think it would end up being HashSet. https://github.com/dotnet/efcore/blob/e6674cdbc8e45ffd890ba20441230fb383087e3a/src/EFCore/Metadata/Internal/CollectionTypeFactory.cs#L26-L33
Based on my knowledge, ICollection<T>
does not specify anything about ordering contract. Items in the collection returned would be in some order but that does not necessarily same as the order in which they are added.
Thanks. I didn't know that and need to ponder how that alters my approach.
Hi @smitpatel,
I researched the HashSet issue you explained and here are my conclusions. Don't change anything in EF Core but helps me understand that not initializing a navigational collection property is OK.
Over the last three years I learnt to not to initialize a navigational collection property. I do this because a few times I forgot an .Include
, then I get the wrong answer, i.e. an empty collection when it should contain a collection. By not initializing a navigational collection fails safe in that if I forget .Include
then the property is null, which causes an exception.
Now, before the ability of EF Core to sort in the .Include
, whether the HashSet holds the data in the order they were added is immaterial, but I did notice they were in primary key order.
With EF Core 5's "sort/filter in Include" how HashSet adds new entries matters. A study of the HashSet
A set is a collection that contains no duplicate elements, and whose elements are in no particular order.
But that is only true because of duplicates - even with duplicates it does has some order - see HashSet documentation example.
I have written this for my own peace of mind, and I will continue to not to initialize a navigational collection property. Thanks for reading.
As I said earlier, items in HashSet has some order but it does not necessarily mean order they are added. Further since it is not part of the contract, implementation change can do anything. If you think about it then it based on implementation of HashSet and default HashCode calculation the order could change in very odd way. So I wouldn't rely on that behavior.
If you want order preserving collection type but uninitialized then you can use IList<T>
. In that case since HashSet is not assignable to it, EF Core will initialize a list for it which will preserve the order in which entries are added.
Thank you @smitpatel. I will change my uninitialized navigational collection property to IList<T>
and update my book to show that, and why that is the best type.
With this knowledge I revisited the code and retested it and I was WRONG about the different SQL being produced (and I have removed from my initial comment). I believe the difference is because of the first bullet in your comment.
I therefore marked this issue is closed.
@JonPSmith Just to close the loop on this, the reason we use HashSet by default is because the performance for Contains for a List with many items is very poor since it does a linear scan each time. This can make change tracking and fixup slow for large data sets.
@ajcvickers,
Thanks for the detailed feedback. This is the typical technical trade-off we meet on any project, in this case using a type where the order is defined against performance.
I will do some performance testing so that when I update my book I can provide some idea of the trade-offs.
Hi @smitpatel,
You said
As I said earlier, items in HashSet has some order but it does not necessarily mean order they are added.
While I agree that the definition of the HashSet says "A set is a collection that contains no duplicate elements, and whose elements are in no particular order." I have looked at the current implementation and it does hold unique values in the order they are added.
@ajcvickers. My performance tests (which DON'T cover all options) show that relational fixup on a query and Add
get large hits but SaveChanges
(change tracking) is OK. Does that fit with your knowledge of EF Core?
PS. I am currently updating my book to use IList
as it is that is the safe solution, but in the chapter on performance tuning I will talk this though and let the developer decide what they want to use.
I have looked at the current implementation and it does hold unique values in the order they are added.
There is a difference between contract and implementation. Changing contract is a breaking change. But changing implementation especially when it does not change contract is not. While current implementation may hold values in the order they are added, changing that implementation in future to void that invariant is non-breaking change as user would be relying on an undocumented behavior.
@JonPSmith See the answer from Jon Skeet on this Stack Overflow issue: https://stackoverflow.com/questions/657263/does-hashset-preserve-insertion-order
In particular:
It's possible that if you never remove any items, it will preserve insertion order. I'm not sure, but I wouldn't be entirely surprised. However, I think it would be a very bad idea to rely on that:
- It's not documented to work that way, and the documentation explicitly states that it's not sorted.
- I haven't looked at the internal structures or source code (which I don't have, obviously) - I'd have to study them carefully before making any such claim in a firm manner.
- The implementation could very easily change between versions of the framework. Relying on this would be like relying on the string.GetHashCode implementation not changing - which some people did back in the .NET 1.1 days, and then they got burned when the implementation did change in .NET 2.0...
I am trying out NET Core 5 release 3 Include filter/order and I get an exception. I expect I have done something wrong but I can't find it.
Steps to reproduce
The unit test
The entity classes are
The DbContext is
Exception is - note it fails on the line
var books = query.ToList();
Further technical details
My Test.csproj file has the following packages
EF Core version: Database provider: Microsoft.EntityFrameworkCore.Sqlite Target framework: 5.0.0-preview.3.20181.2 Operating system: IDE: Visual Studio 2019 16.5.4