dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.61k stars 3.14k forks source link

System.InvalidOperationException: Unable to translate a collection subquery in a projection... using Union or Concat #26703

Closed marianosz closed 1 year ago

marianosz commented 2 years ago

Hello team

I have this piece of code that is working fine on EF Core 5, and previously on v3:

This is a property defined on the DbContext:

public IQueryable<Candidate> MyCandidates
{
    get
    {
        if (this.usersService.UserHasPermission("all:candidates"))
        {
            return this.Candidates;
        }

        var userId = this.usersService.GetUserId();

        var candidates = this.Candidates
            .Where(x =>
                x.CreatedByUserId == userId || x.OwnedByUserId == userId);

        var asignedCandidates =
            from cs in this.Candidates
            from an in cs.Assignations
            where an.AssignedUserId == userId &&
            (an.Status == AssignationStatus.Approved ||
            an.Status == AssignationStatus.Direct)
            select cs;

        return asignedCandidates
            .Union(candidates);
    }
}

Basically, if the current user has the "all:candidates" permission, he can query using the Candidates DbSet, if not, he can query the union of candidates created by him, plus the candidates assigned to him.

Then when I try to run the following query, if the user doesn't have the "all:candidates" permission, and uses the code with the Union() I have the following exception:

var userId = this.usersService.GetUserId();

var query = dbContext.MyCandidates
    .AsSplitQuery()
    .Include(x => x.Pins
        .Where(x => x.UserId == userId))
    .Include(x => x.MainRole)
    .Include(x => x.Flags)
        .ThenInclude(x => x.Flag)
    .Include(x => x.ManagerFlags)
        .ThenInclude(x => x.Flag)
    .Include(x => x.Technologies
        .Where(x => !x.NotInterested))
        .ThenInclude(x => x.Technology)
    .Include(x => x.Activities
        .Where(x => x.Visibility == ActivityVisibility.Public))
    .Include(x => x.Age)
    .AsNoTracking()
    .ToList();

The same exception occurs with Concat() instead of union, and is not happening when I remove the Union and return one query or another

2021-11-15 22:19:32.1843|ERROR|ExceptionHandlerMiddleware|Exception: System.InvalidOperationException: Unable to translate a collection subquery in a projection since either parent or the subquery doesn't project necessary information required to uniquely identify it and correctly generate results on the client side. This can happen when trying to correlate on keyless entity type. This can also happen for some cases of projection before 'Distinct' or some shapes of grouping key in case of 'GroupBy'. These should either contain all key properties of the entity that the operation is applied on, or only contain simple property access expressions.
   at Microsoft.EntityFrameworkCore.Query.SqlExpressions.SelectExpression.ApplyProjection(Expression shaperExpression, ResultCardinality resultCardinality, QuerySplittingBehavior querySplittingBehavior)
   at Microsoft.EntityFrameworkCore.Query.Internal.SelectExpressionProjectionApplyingExpressionVisitor.VisitExtension(Expression extensionExpression)
   at System.Linq.Expressions.Expression.Accept(ExpressionVisitor visitor)
   at System.Linq.Expressions.ExpressionVisitor.Visit(Expression node)
   at Microsoft.EntityFrameworkCore.Query.RelationalQueryTranslationPostprocessor.Process(Expression query)
   at Microsoft.EntityFrameworkCore.Query.QueryCompilationContext.CreateQueryExecutor[TResult](Expression query)
   at Microsoft.EntityFrameworkCore.Storage.Database.CompileQuery[TResult](Expression query, Boolean async)
   at Microsoft.EntityFrameworkCore.Query.Internal.QueryCompiler.CompileQueryCore[TResult](IDatabase database, Expression query, IModel model, Boolean async)
   at Microsoft.EntityFrameworkCore.Query.Internal.QueryCompiler.<>c__DisplayClass12_0`1.<ExecuteAsync>b__0()
   at Microsoft.EntityFrameworkCore.Query.Internal.CompiledQueryCache.GetOrAddQuery[TResult](Object cacheKey, Func`1 compiler)
   at Microsoft.EntityFrameworkCore.Query.Internal.QueryCompiler.ExecuteAsync[TResult](Expression query, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Query.Internal.EntityQueryProvider.ExecuteAsync[TResult](Expression expression, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Query.Internal.EntityQueryable`1.GetAsyncEnumerator(CancellationToken cancellationToken)
   at System.Runtime.CompilerServices.ConfiguredCancelableAsyncEnumerable`1.GetAsyncEnumerator()
   at Microsoft.EntityFrameworkCore.EntityFrameworkQueryableExtensions.ToListAsync[TSource](IQueryable`1 source, CancellationToken cancellationToken)
   at x.HR.API.Controllers.CandidatesController.List(String filter, String sortBy, String sortDirection, Int32 page, Int32 pageSize) in /Users/x/src/x-hr-backend/x.HR.API/Controllers/CandidatesController.cs:line 102
   at lambda_method94(Closure , Object )
   at Microsoft.AspNetCore.Mvc.Infrastructure.ActionMethodExecutor.AwaitableObjectResultExecutor.Execute(IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeActionMethodAsync>g__Awaited|12_0(ControllerActionInvoker invoker, ValueTask`1 actionResultValueTask)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeNextActionFilterAsync>g__Awaited|10_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Rethrow(ActionExecutedContextSealed context)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.InvokeInnerFilterAsync()
--- End of stack trace from previous location ---
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeFilterPipelineAsync>g__Awaited|20_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
2
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeAsync>g__Awaited|17_0(ResourceInvoker invoker, Task task, IDisposable scope)
   at Microsoft.AspNetCore.Routing.EndpointMiddleware.<Invoke>g__AwaitRequestTask|6_0(Endpoint endpoint, Task requestTask, ILogger logger)
   at Microsoft.AspNetCore.Localization.RequestLocalizationMiddleware.Invoke(HttpContext context)
   at x.HR.API.Infrastructure.Middlewares.Cache.NoCacheMiddleware.Invoke(HttpContext httpContext) in /Users/x/src/x-hr-backend/x.HR.API/Infrastructure/Middlewares/Cache/NoCacheMiddleware.cs:line 25
   at Microsoft.AspNetCore.Authorization.Policy.AuthorizationMiddlewareResultHandler.HandleAsync(RequestDelegate next, HttpContext context, AuthorizationPolicy policy, PolicyAuthorizationResult authorizeResult)
   at Microsoft.AspNetCore.Authorization.AuthorizationMiddleware.Invoke(HttpContext context)
   at Microsoft.AspNetCore.Authentication.AuthenticationMiddleware.Invoke(HttpContext context)
   at Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddleware.<Invoke>g__Awaited|6_0(ExceptionHandlerMiddleware middleware, HttpContext context, Task task)
Url: /api/candidates
Query String: ?page=0&pageSize=30
Method: GET

EF Core version: 6.0.0 Database provider: Microsoft.EntityFrameworkCore.SqlServer Target framework: .NET 6.0 Operating system: MacOs IDE: VS Code

ajcvickers commented 2 years ago

@marianosz Please attach a small, runnable project or post a small, runnable code listing that reproduces what you are seeing so that we can investigate.

marianosz commented 2 years ago

Ok, i will try to build something to reproduce...

marianosz commented 2 years ago

@ajcvickers https://github.com/marianosz/ef_6_core_issue

marianosz commented 2 years ago

@ajcvickers I did more research, looks like the issue is related to the Includes, I tested the query removing the Include for related collections, and is working.

ajcvickers commented 2 years ago

@smitpatel Simplified query. Does not fail if Include is removed:

var asignedCandidates =
    from cs in context.Candidates
    join an in context.CandidateAssignations on cs.Id equals an.AssignedId
    select cs;

var result = asignedCandidates
    .Union(context.Candidates)
    .Include(x => x.Technologies)
    .ToList();
ajcvickers commented 2 years ago

@marianosz This change is by-design for 6.0. This query now throws because, depending on the results, the query could easily return bad data in 5.0. As the message says, "This can also happen for some cases of projection before 'Distinct' or some shapes of grouping key in case of 'GroupBy'. These should either contain all key properties of the entity that the operation is applied on, or only contain simple property access expressions."

smitpatel commented 2 years ago

We should add fwlink to the exception message and explain on the doc about set operations limitation when they can contain repeating elements.

angelochiello commented 2 years ago

@smitpatel Simplified query. Does not fail if Include is removed:

var asignedCandidates =
    from cs in context.Candidates
    join an in context.CandidateAssignations on cs.Id equals an.AssignedId
    select cs;

var result = asignedCandidates
    .Union(context.Candidates)
    .Include(x => x.Technologies)
    .ToList();

I have a similar but more complex query and it works only if "include" are just before executing (ToList()).

smitpatel commented 2 years ago

You cannot include a collection navigation after performing a set operation like that since your set operation result can contain duplicate items.

StevenRasmussen commented 2 years ago

Just ran into this... So what is the correct way to try and perform a Union but also include related properties/collections? I tried to perform the Include on the queries before the Union but with the same result. Ie:

var result = asignedCandidates.Include(x => x.Technologies)
    .Union(context.Candidates.Include(x => x.Technologies))
    .ToList();

Still throws the error though... maybe this isn't possible 🤷‍♂️

ajcvickers commented 2 years ago

@StevenRasmussen It depends how the queries are created. For example, this is fine:

var filteredBlogs = context.Blogs.Where(e => e.Id == 1);
var blogs = filteredBlogs.Union(context.Blogs).Include(e => e.Posts).ToList();

But this is not:

var filteredBlogs = context.Blogs.Where(e => e.Id == 1).Select(e => new Blog { Id = e.Id });
var blogs = filteredBlogs.Union(context.Blogs).Include(e => e.Posts).ToList();

Full repro code:

public class Blog
{
    public int Id { get; set; }
    public int SomeOtherId { get; set; }
    public string Name { get; set; }

    public List<Post> Posts { get; } = new();
}

public class Post
{
    public int Id { get; set; }
    public string Title { get; set; }
    public string Content { get; set; }

    public Blog Blog { get; set; }
}

public class SomeDbContext : DbContext
{
    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder
            .UseSqlServer(Your.ConnectionString)
            .LogTo(Console.WriteLine, LogLevel.Information)
            .EnableSensitiveDataLogging();

    public DbSet<Blog> Blogs { get; set; }
    public DbSet<Post> Posts { get; set; }
}

public class Program
{
    public static void Main()
    {
        using (var context = new SomeDbContext())
        {
            context.Database.EnsureDeleted();
            context.Database.EnsureCreated();

            context.Add(new Blog {Posts = {new Post(), new Post()}});
            context.Add(new Blog {Posts = {new Post(), new Post(), new Post()}});
            context.SaveChanges();
        }

        using(var context = new SomeDbContext())
        {
            // var filteredBlogs = context.Blogs.Where(e => e.Id == 1).Select(e => new Blog { Id = e.Id });
            // var blogs = filteredBlogs.Union(context.Blogs).Include(e => e.Posts).ToList();

            var filteredBlogs = context.Blogs.Where(e => e.Id == 1);
            var blogs = filteredBlogs.Union(context.Blogs).Include(e => e.Posts).ToList();

            foreach (var blog in context.Blogs)
            {
                Console.WriteLine($"Blog {blog.Id} with {blog.Posts.Count} posts.");
            }
        }
    }
}
StevenRasmussen commented 2 years ago

@ajcvickers - Thanks for taking the time to respond... and for the demo project! You have pointed me in the right direction. It appears based on your project and from me updating mine that you can't really do a projection on the results of the union as part of the select statement if it includes any of the include properties/collections. What I ended up doing is simply selecting the actual entity before projecting to a new one. Thanks!

StevenRasmussen commented 2 years ago

@ajcvickers - Looks like I spoke too soon ☹️ . My situation is a bit more complicated. It appears that this breaks down if one of the queries has a join to another table, even if the resulting query only selects the same entity type. I have updated your example:

using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.Logging;

public class Blog
{
    public int Id { get; set; }
    public int SomeOtherId { get; set; }
    public string Name { get; set; }
    public List<Post> Posts { get; } = new();
}

public class SuperBlog : Blog
{
    public string SomeOtherProperty { get; set; }
}

public class Post
{
    public int Id { get; set; }
    public string Title { get; set; }
    public string Content { get; set; }
    public int BlogId { get; set; }
    public Blog Blog { get; set; }
}

public class SomeDbContext : DbContext
{
    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder
            .UseSqlServer(Your.ConnectionString)
            .LogTo(Console.WriteLine, LogLevel.Information)
            .EnableSensitiveDataLogging();

    public DbSet<Blog> Blogs { get; set; }
    public DbSet<SuperBlog> SuperBlogs { get; set; }
    public DbSet<Post> Posts { get; set; }
}

public class Program
{
    public static void Main()
    {
        using (var context = new SomeDbContext())
        {
            context.Database.EnsureDeleted();
            context.Database.EnsureCreated();

            context.Add(new Blog { Posts = { new Post(), new Post() } });
            context.Add(new Blog { Posts = { new Post(), new Post(), new Post() } });
            context.SaveChanges();
        }

        using (var context = new SomeDbContext())
        {
            var filteredBlogs = context.Blogs.Where(e => e.Id == 1);

            var superBlogs = from b in context.Blogs
                             join sb in context.SuperBlogs on b.Id equals sb.Id
                             select b;

            var blogsWithJoin = from b in context.Blogs
                                join p in context.Posts on b.Id equals p.BlogId
                                select b;

            try
            {
                // This doesn't work :(.  I thought it was because of the inheritance ob 'Blog' -> 'SuperBlog'
                var blogs = filteredBlogs.Union(superBlogs).Include(e => e.Posts).ToList();
            }
            catch (Exception ex) { }

            try
            {
                // This proves that if there is any 'join' then the query fails
                var blogs = filteredBlogs.Union(blogsWithJoin).Include(e => e.Posts).ToList();
            }
            catch (Exception ex) { }

            foreach (var blog in context.Blogs)
            {
                Console.WriteLine($"Blog {blog.Id} with {blog.Posts.Count} posts.");
            }
        }
    }
}

Should this be supported?

angelochiello commented 2 years ago

Have you tried to remove the 2 "join" lines from superBlogs and blogsWithJoin? Just a thought looking at yoyr code.

StevenRasmussen commented 2 years ago

@angelochiello - Yes, I know that in this particular case you could remove the join clauses... it was more just to demonstrate the issue of using union's with join clauses. In reality, I'm performing some where clauses on the joins to filter out some of the data. Thanks for taking a look though.

angelochiello commented 2 years ago

@StevenRasmussen In my case, wheres were OK. Problem was on the include part. So I did something like;

  1. var query = _ctx.Entity.Where(x => x.Property == 1);
  2. // some other query;
  3. // some union
  4. query = query.Include(x => x.ReferencedEntity);
  5. var list = query.ToList();

If I put include on line 1, it doesen't work. on line 4, id does. I think "join" are translated as include, am I wrong?

StevenRasmussen commented 2 years ago

So to be clear, what I’m looking for is the ability to use union with multiple queries that contain joins and also be able to use include statements on the final query. Like the OP, I am dynamically building a query based on permissions, that I would like to use as my core query when dealing with the entity type. I then further modify that core query with include statements and/or join with other tables depending on the shape of the data I want to return in any particular situation.

meinsiedler commented 2 years ago

We faced the same exception as the OP when using .Concat in combination with .Include. After we changed the .Concat call to .Union, the query worked as expected and generated the same SQL query as EF Core 5.0.12 did.

With EF 5.0.12 both, .Concat and .Union produced the same SQL query.

smitpatel commented 2 years ago

In EF Core, in order to do collection include, it requires each record of the parent (on which the collection will be populated) uniquely identifiable. Since then we can fetch related records and fix up the navigations properly. This is by design and cannot be changed else it can cause duplicated records in the collection/improperly filled collection/collection not marked as loaded. The concept of uniquely identifiable is at row level which means we do track origin, and even if the parent entity is duplicated, if the row have unique identifier we still work with above principal. An example would be Orders.Select(o => o.Customer), since each customer can multiple orders, the result set will have duplicated customer entities but together with PK of Orders, each row in the result is still unique (OrderId, CustomerId together creates a composite unique key).

With above things in mind

But once you apply set operation with above query to something else which has different origin, there is no way to uniquely identify rows anymore. So we cannot do collection include. This is something which cannot be changed unless we come up with altogether different idea in how to do collection include.

StevenRasmussen commented 2 years ago

@smitpatel - Thanks for the detailed explanation! I think I understand the issue now and how to move forward in my particular scenario. I appreciate the effort into providing such a well thought out response!

SoftCircuits commented 10 months ago

@ajcvickers Is there a more detailed explanation of this anywhere? How would it return invalid data if AsNoTracking() is used? In my case, I need a different Select() syntax for each of the two results I'm Union()ing together. I don't see how to make that work reliably if this is an issue.