dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.71k stars 3.17k forks source link

Union in collection subquery cannot be translated #34849

Open InspiringCode opened 6 days ago

InspiringCode commented 6 days ago

When I run the following query in EF Core 8:

db.Users.Select(x => new AuthorizedUser
{
    Id = x.Id,
    DefaultLanguageCode = x.DefaultLanguageCode,
    AllPermissions = x
        .AssignedPermissions.Select(x => x.BusinessKey)
        .Union(x.Roles.SelectMany(x => x.Permissions, (_, p) => p.BusinessKey))
        .ToHashSet()
}).ToArray();

I get the error:

Unable to translate a collection subquery in a projection since either parent or the subquery doesn't project necessary information required to uniquely identify it and correctly generate results on the client side. This can happen when trying to correlate on keyless entity type. This can also happen for some cases of projection before 'Distinct' or some shapes of grouping key in case of 'GroupBy'. These should either contain all key properties of the entity that the operation is applied on, or only contain simple property access expressions.

User.AssignedPermissions and User.Roles are mapped as many-to-many relations. I tried several variants (including with Join, Distinct, explicit Select s etc), but all fail to translate. The Problem seems to be the Union operation (both sides of the union without the union seem to work fine).

Why is this?

maumar commented 5 days ago

note to self: related to https://github.com/dotnet/efcore/issues/33410

maumar commented 4 days ago

standalone repro:

    [ConditionalFact]
    public async Task Repro34849()
    {
        using (var ctx = new MyContext())
        {
            await ctx.Database.EnsureDeletedAsync();
            await ctx.Database.EnsureCreatedAsync();
        }

        using (var ctx = new MyContext())
        {
            var query = await ctx.Users.Select(x => new AuthorizedUser
            {
                Id = x.Id,
                DefaultLanguageCode = x.DefaultLanguageCode,
                AllPermissions = x.AssignedPermissions
                    .Select(x => x.BusinessKey)
                    .Union(x.Roles.SelectMany(x => x.Permissions, (_, p) => p.BusinessKey))
                    .ToHashSet()
            }).ToListAsync();
        }
    }

    public class MyContext : DbContext
    {
        public DbSet<User> Users { get; set; }

        protected override void OnModelCreating(ModelBuilder modelBuilder)
        {
        }

        protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        {
            optionsBuilder.UseSqlServer(@"Server=(localdb)\mssqllocaldb;Database=Repro34849;Trusted_Connection=True;MultipleActiveResultSets=true");
        }
    }

    public class User
    {
        public int Id { get; set; }
        public string DefaultLanguageCode { get; set; }
        public List<Permission> AssignedPermissions { get; set; }
        public List<Role> Roles { get; set; }
    }

    public class Permission
    {
        public int Id { get; set; }
        public int BusinessKey { get; set; }
    }

    public class Role
    {
        public int Id { get; set; }
        public List<Permission> Permissions { get; set; }

    }

    public class AuthorizedUser
    {
        public int Id { get; set; }
        public string DefaultLanguageCode { get; set; }
        public HashSet<int> AllPermissions { get; set; }
    }
maumar commented 4 days ago

problem is how we handle Union (SelectExpression -> ApplySetOperation). We try to union two queries:

SELECT p5.BusinessKey
FROM Permission AS p5
WHERE (u.Id != NULL) && (u.Id == p5.UserId)
SELECT p6.BusinessKey
FROM Role AS r2
INNER JOIN Permission AS p6 ON r2.Id == p6.RoleId
WHERE (u.Id != NULL) && (u.Id == r2.UserId)

those two queries on it's own have different identifiers (things we use to bucket the results accordingly, which are usually all the keys of all the entities). First one uses p5.Id, second one uses r.Id and p6.Id. Normally for set operations, identifiers on both sides must match (as we going to be combining them). Since they don't, we give up and mark the expression as not identifiable.

However, Union is a bit different, it removes the duplicates, so we can use the entire projection as identifier if need be. (we would still proper identifiers if they are available to avoid unnecessary projections, but we can fall back to the projection for case like this where proper identifiers don't work).

maumar commented 4 days ago

@InspiringCode a hack/workaround you can use is to make both sides of the union having a matching shapes (in terms of number of entities that they use). You can do that by adding a superfluous left join to the first part. It doesn't matter what entity you join with (but it needs to be an entity, and needs to have the same number of primary key properties as Role) and by doing left join that never matches it's condition you don't add any data (which would get wiped out by DISTINCT anyway). It's quite hacky, but should get the job done.

            var query2 = await ctx.Users.Select(x => new AuthorizedUser
            {
                Id = x.Id,
                DefaultLanguageCode = x.DefaultLanguageCode,
                AllPermissions = (from xx in x.AssignedPermissions
                                 join dummy in ctx.Set<Foo>() on true equals false into grouping
                                 from dummy in grouping.DefaultIfEmpty()
                                 select xx).Select(x => x.BusinessKey)
                    .Union(x.Roles.SelectMany(x => x.Permissions, (_, p) => p.BusinessKey))
                    .ToHashSet()
            }).ToListAsync();
maumar commented 3 days ago

interesting cases to test out: