dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.69k stars 3.17k forks source link

NET Core 5 release 3 - Include OrderBy throws exception #20777

Closed JonPSmith closed 1 year ago

JonPSmith commented 4 years ago

I am trying out NET Core 5 release 3 Include filter/order and I get an exception. I expect I have done something wrong but I can't find it.

Steps to reproduce

The unit test

[Fact]
public void TestIncludeSortReviews()
{
    //SETUP
    var options = SqliteInMemory.CreateOptions<EfCoreContext>();
    using (var context = new EfCoreContext(options))
    {
        context.Database.EnsureCreated();
        var newBook = new Book { Reviews = new List<Review>
        {
            new Review { NumStars = 2 } , new Review { NumStars = 1 }
        } };
        context.Add(newBook);
        context.SaveChanges();

        //ATTEMPT
        var query = context.Books
            .Include(x => x.Reviews.OrderBy(y => y.NumStars));
        var books = query.ToList();

        //VERIFY
        _output.WriteLine(query.ToQueryString());
        books.Single().Reviews.Select(x => x.NumStars).ShouldEqual(new []{1,2});

    }
}

The entity classes are

public class Book 
{
    public int BookId { get; set; } 
    public string Title { get; set; }
    public string Description { get; set; }
    public DateTime PublishedOn { get; set; }
    public string Publisher { get; set; }
    public decimal Price { get; set; }
    public string ImageUrl { get; set; }

    public bool SoftDeleted { get; set; }

    //-----------------------------------------------
    //relationships

    public PriceOffer Promotion { get; set; }
    public ICollection<Review> Reviews { get; set; } 
    public ICollection<BookAuthor> AuthorsLink { get; set; } 
}
public class Review 
{
    public int ReviewId { get; set; }
    public string VoterName { get; set; }
    public int NumStars { get; set; }
    public string Comment { get; set; }

    //-----------------------------------
    //Relationships

    public int BookId { get; set; } //#M
}

The DbContext is

public class EfCoreContext : DbContext
{
    public EfCoreContext(DbContextOptions<EfCoreContext> options)
        : base(options) { }

    public DbSet<Book> Books { get; set; }
    public DbSet<Author> Authors { get; set; }
    public DbSet<PriceOffer> PriceOffers { get; set; }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<BookAuthor>() 
            .HasKey(x => new {x.BookId, x.AuthorId});

        modelBuilder.Entity<Book>()
            .HasQueryFilter(p => !p.SoftDeleted);
    } 
}

Exception is - note it fails on the line var books = query.ToList();

System.ArgumentException : Expression of type 'System.Linq.IOrderedQueryable`1[DataLayer.EfClasses.Review]' cannot be used for return type 'System.Linq.IOrderedEnumerable`1[DataLayer.EfClasses.Review]'
   at System.Linq.Expressions.Expression.ValidateLambdaArgs(Type delegateType, Expression& body, ReadOnlyCollection`1 parameters, String paramName)
   at System.Linq.Expressions.Expression.Lambda[TDelegate](Expression body, String name, Boolean tailCall, IEnumerable`1 parameters)
   at System.Linq.Expressions.Expression.Lambda[TDelegate](Expression body, Boolean tailCall, IEnumerable`1 parameters)
   at System.Linq.Expressions.Expression.Lambda[TDelegate](Expression body, ParameterExpression[] parameters)
   at System.Linq.Expressions.Expression1`1.Rewrite(Expression body, ParameterExpression[] parameters)
   at System.Linq.Expressions.ExpressionVisitor.VisitLambda[T](Expression`1 node)
   at System.Linq.Expressions.Expression`1.Accept(ExpressionVisitor visitor)
   at System.Linq.Expressions.ExpressionVisitor.Visit(Expression node)
   at System.Linq.Expressions.ExpressionVisitor.VisitUnary(UnaryExpression node)
   at System.Linq.Expressions.UnaryExpression.Accept(ExpressionVisitor visitor)
   at System.Linq.Expressions.ExpressionVisitor.Visit(Expression node)
   at System.Dynamic.Utils.ExpressionVisitorUtils.VisitArguments(ExpressionVisitor visitor, IArgumentProvider nodes)
   at System.Linq.Expressions.ExpressionVisitor.VisitMethodCall(MethodCallExpression node)
   at Microsoft.EntityFrameworkCore.Query.Internal.QueryableMethodNormalizingExpressionVisitor.VisitMethodCall(MethodCallExpression methodCallExpression)
   at System.Linq.Expressions.MethodCallExpression.Accept(ExpressionVisitor visitor)
   at System.Linq.Expressions.ExpressionVisitor.Visit(Expression node)
   at Microsoft.EntityFrameworkCore.Query.QueryTranslationPreprocessor.NormalizeQueryableMethodCall(Expression expression)
   at Microsoft.EntityFrameworkCore.Query.RelationalQueryTranslationPreprocessor.NormalizeQueryableMethodCall(Expression expression)
   at Microsoft.EntityFrameworkCore.Query.QueryTranslationPreprocessor.Process(Expression query)
   at Microsoft.EntityFrameworkCore.Query.QueryCompilationContext.CreateQueryExecutor[TResult](Expression query)
   at Microsoft.EntityFrameworkCore.Storage.Database.CompileQuery[TResult](Expression query, Boolean async)
   at Microsoft.EntityFrameworkCore.Query.Internal.QueryCompiler.CompileQueryCore[TResult](IDatabase database, Expression query, IModel model, Boolean async)
   at Microsoft.EntityFrameworkCore.Query.Internal.QueryCompiler.<>c__DisplayClass9_0`1.<Execute>b__0()
   at Microsoft.EntityFrameworkCore.Query.Internal.CompiledQueryCache.GetOrAddQueryCore[TFunc](Object cacheKey, Func`1 compiler)
   at Microsoft.EntityFrameworkCore.Query.Internal.CompiledQueryCache.GetOrAddQuery[TResult](Object cacheKey, Func`1 compiler)
   at Microsoft.EntityFrameworkCore.Query.Internal.QueryCompiler.Execute[TResult](Expression query)
   at Microsoft.EntityFrameworkCore.Query.Internal.EntityQueryProvider.Execute[TResult](Expression expression)
   at Microsoft.EntityFrameworkCore.Query.Internal.EntityQueryable`1.GetEnumerator()
   at Microsoft.EntityFrameworkCore.EntityFrameworkQueryableExtensions.IncludableQueryable`2.GetEnumerator()
   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
   at Test.UnitTests.TestDataLayer.Ch06_HowIncludeWorks.TestIncludeSortSingle() in C:\Users\JonPSmith\source\repos\EfCoreinAction-SecondEdition\Test\UnitTests\TestDataLayer\Ch06_HowIncludeWorks.cs:line 65

Further technical details

My Test.csproj file has the following packages

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <TargetFramework>netcoreapp3.1</TargetFramework>

    <IsPackable>false</IsPackable>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="EfCore.TestSupport" Version="3.1.1" />
    <PackageReference Include="Microsoft.EntityFrameworkCore" Version="5.0.0-preview.3.20181.2" />
    <PackageReference Include="Microsoft.EntityFrameworkCore.Proxies" Version="5.0.0-preview.3.20181.2" />
    <PackageReference Include="Microsoft.EntityFrameworkCore.Sqlite" Version="5.0.0-preview.3.20181.2" />
    <PackageReference Include="Microsoft.EntityFrameworkCore.SqlServer" Version="5.0.0-preview.3.20181.2" />
    <PackageReference Include="Microsoft.NET.Test.Sdk" Version="16.5.0" />
    <PackageReference Include="xunit" Version="2.4.1" />
    <PackageReference Include="xunit.runner.visualstudio" Version="2.4.1">
      <PrivateAssets>all</PrivateAssets>
      <IncludeAssets>runtime; build; native; contentfiles; analyzers; buildtransitive</IncludeAssets>
    </PackageReference>
    <PackageReference Include="coverlet.collector" Version="1.2.0" />
  </ItemGroup>

  <ItemGroup>
    <ProjectReference Include="..\BookApp\BookApp.csproj" />
    <ProjectReference Include="..\DataLayer\DataLayer.csproj" />
    <ProjectReference Include="..\ServiceLayer\ServiceLayer.csproj" />
  </ItemGroup>

</Project>

EF Core version: Database provider: Microsoft.EntityFrameworkCore.Sqlite Target framework: 5.0.0-preview.3.20181.2 Operating system: IDE: Visual Studio 2019 16.5.4

smitpatel commented 4 years ago

Duplicate of #20578 It is fixed in nightly build already and will be released in new preview. As a work-around, you can put Skip(0) after OrderBy and it should work.

JonPSmith commented 4 years ago

I have just updated to release 5.0.0-preview.4.20220.10 of EF Core and the code in the initial issue now runs, but fails to order the data.

However, if I change the code to a disconnected state it works.

Below is the code that works

[Fact]
public void TestIncludeSortReviewsDisconnected()
{
    //SETUP
    var options = SqliteInMemory.CreateOptions<EfCoreContext>();
    using (var context = new EfCoreContext(options))
    {
        context.Database.EnsureCreated();
        var newBook = new Book
        {
            Reviews = new List<Review>
            {
                new Review {NumStars = 2}, new Review {NumStars = 1}
            }
        };
        context.Add(newBook);
        context.SaveChanges();
    }
    using (var context = new EfCoreContext(options))
    {
        //ATTEMPT
        var query = context.Books
            .Include(x => x.Reviews.OrderBy(y => y.NumStars));
        var books = query.ToList();

        //VERIFY
        _output.WriteLine(query.ToQueryString());
        books.Single().Reviews.Select(x => x.NumStars).ShouldEqual(new[] { 1, 2 });
    }
}

UPDATED: removed incorrect SQL commands.

EF Core version: Database provider: Microsoft.EntityFrameworkCore.Sqlite Target framework: 5.0.0-preview.4.20220.10 Operating system: Windows 10 IDE: Visual Studio 2019 16.5.5

smitpatel commented 4 years ago

We need to investigate why the SQL generated is different in both cases.

@JonPSmith - The behavior you are seeing is somewhat expected. 2 reasons behind it

JonPSmith commented 4 years ago

@smitpatel . Thank you for your quick and detailed response. I have a question on your second bullet. The type of the Reviews navigational property is ICollection<Review>, which is ordered. I had assumed that assigning to that would retain the order. If it doesn't then I think I would have had more problems on other sorting issues than just this one. Can you explain - thanks.

smitpatel commented 4 years ago

Based on this code, I think it would end up being HashSet. https://github.com/dotnet/efcore/blob/e6674cdbc8e45ffd890ba20441230fb383087e3a/src/EFCore/Metadata/Internal/CollectionTypeFactory.cs#L26-L33

Based on my knowledge, ICollection<T> does not specify anything about ordering contract. Items in the collection returned would be in some order but that does not necessarily same as the order in which they are added.

JonPSmith commented 4 years ago

Thanks. I didn't know that and need to ponder how that alters my approach.

JonPSmith commented 4 years ago

Hi @smitpatel,

I researched the HashSet issue you explained and here are my conclusions. Don't change anything in EF Core but helps me understand that not initializing a navigational collection property is OK.

Over the last three years I learnt to not to initialize a navigational collection property. I do this because a few times I forgot an .Include, then I get the wrong answer, i.e. an empty collection when it should contain a collection. By not initializing a navigational collection fails safe in that if I forget .Include then the property is null, which causes an exception.

Now, before the ability of EF Core to sort in the .Include, whether the HashSet holds the data in the order they were added is immaterial, but I did notice they were in primary key order.

With EF Core 5's "sort/filter in Include" how HashSet adds new entries matters. A study of the HashSet code shows it does add new entries in the order they are presented, unless there is a duplicate (it uses Shot[] and adds to the end if the next value is unique). The HashSet documentation says

A set is a collection that contains no duplicate elements, and whose elements are in no particular order.

But that is only true because of duplicates - even with duplicates it does has some order - see HashSet documentation example.

I have written this for my own peace of mind, and I will continue to not to initialize a navigational collection property. Thanks for reading.

smitpatel commented 4 years ago

As I said earlier, items in HashSet has some order but it does not necessarily mean order they are added. Further since it is not part of the contract, implementation change can do anything. If you think about it then it based on implementation of HashSet and default HashCode calculation the order could change in very odd way. So I wouldn't rely on that behavior. If you want order preserving collection type but uninitialized then you can use IList<T>. In that case since HashSet is not assignable to it, EF Core will initialize a list for it which will preserve the order in which entries are added.

JonPSmith commented 4 years ago

Thank you @smitpatel. I will change my uninitialized navigational collection property to IList<T> and update my book to show that, and why that is the best type.

With this knowledge I revisited the code and retested it and I was WRONG about the different SQL being produced (and I have removed from my initial comment). I believe the difference is because of the first bullet in your comment.

I therefore marked this issue is closed.

ajcvickers commented 4 years ago

@JonPSmith Just to close the loop on this, the reason we use HashSet by default is because the performance for Contains for a List with many items is very poor since it does a linear scan each time. This can make change tracking and fixup slow for large data sets.

JonPSmith commented 4 years ago

@ajcvickers,

Thanks for the detailed feedback. This is the typical technical trade-off we meet on any project, in this case using a type where the order is defined against performance.

I will do some performance testing so that when I update my book I can provide some idea of the trade-offs.

JonPSmith commented 4 years ago

Hi @smitpatel,

You said

As I said earlier, items in HashSet has some order but it does not necessarily mean order they are added.

While I agree that the definition of the HashSet says "A set is a collection that contains no duplicate elements, and whose elements are in no particular order." I have looked at the current implementation and it does hold unique values in the order they are added.

@ajcvickers. My performance tests (which DON'T cover all options) show that relational fixup on a query and Add get large hits but SaveChanges (change tracking) is OK. Does that fit with your knowledge of EF Core?

PS. I am currently updating my book to use IList as it is that is the safe solution, but in the chapter on performance tuning I will talk this though and let the developer decide what they want to use.

smitpatel commented 4 years ago

I have looked at the current implementation and it does hold unique values in the order they are added.

There is a difference between contract and implementation. Changing contract is a breaking change. But changing implementation especially when it does not change contract is not. While current implementation may hold values in the order they are added, changing that implementation in future to void that invariant is non-breaking change as user would be relying on an undocumented behavior.

ajcvickers commented 4 years ago

@JonPSmith See the answer from Jon Skeet on this Stack Overflow issue: https://stackoverflow.com/questions/657263/does-hashset-preserve-insertion-order

In particular:

It's possible that if you never remove any items, it will preserve insertion order. I'm not sure, but I wouldn't be entirely surprised. However, I think it would be a very bad idea to rely on that:

  • It's not documented to work that way, and the documentation explicitly states that it's not sorted.
  • I haven't looked at the internal structures or source code (which I don't have, obviously) - I'd have to study them carefully before making any such claim in a firm manner.
  • The implementation could very easily change between versions of the framework. Relying on this would be like relying on the string.GetHashCode implementation not changing - which some people did back in the .NET 1.1 days, and then they got burned when the implementation did change in .NET 2.0...