dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.5k stars 3.13k forks source link

`GroupBy` over complex key generates redundant subquery #30113

Open alexb5dh opened 1 year ago

alexb5dh commented 1 year ago

Summary

In case anything different from property or navigation access is performed in GroupBy method, the generated query will add an additional subquery. Probably to avoid repeating SQL expression used in group by.

Example

Having the following context:

public class Entity
{
    public int Id { get; private set; }
}

public class AppContext: DbContext
{
    public DbSet<Entity> Entities { get; private set; }

    //...
}

and perfoming query:

await context.Entities
    .GroupBy(e => e.Id.ToString())
    .Select(g => g.Key)
    .ToListAsync()

version 7.0.0-preview.5.22302.2 (and later, up to 7.0.2 as of the time of writing) generates this SQL:

SELECT [t].[Key]
FROM (
    SELECT CONVERT(varchar(11), [e].[Id]) AS [Key]
FROM [Entities] AS [e]
    ) AS [t]
GROUP BY [t].[Key]

while version 7.0.0-preview.4.22229.2 (and before) does querying via:

SELECT CONVERT(varchar(11), [e].[Id])
FROM [Entities] AS [e]
GROUP BY CONVERT(varchar(11), [e].[Id])

Changing the GroupBy expression to e => e.Id (or navigation property access ) gets rid of subquery in all cases, yet using something as simple as e => 1 still generates one.

MRE is attached: GroupBySubquery.zip. Just replace the connection string with the one you use.

Сonclusion

The first option seems to generally have a worse execution plan, especially as the main query grows more complex.

I'm not sure if this was an intentional change or an introduced bug, but in any case, user needs to have the option to revert to an old behaviour, as this can affect query performance quite noticeably.

In similar looking issues I found a suggested workaround of adding

AppContext.SetSwitch("Microsoft.EntityFrameworkCore.Issue27102", true);

but this doesn't seem to affect this case.

Environment

EF Core version: 7.0.2, 7.0.0-preview.5.22302.2 Database provider: Microsoft.EntityFrameworkCore.SqlServer Target framework: .NET 7.0 Operating system: Windows

ajcvickers commented 1 year ago

/cc @maumar @roji

georg-jung commented 1 year ago

I guess this is related to https://learn.microsoft.com/en-us/ef/core/what-is-new/ef-core-7.0/whatsnew#groupby-entity-type. This is a major performance decrease in our app too. What took 1 or 2 seconds on EF Core 6 now times out after >1min in production. If there is no fix for this it is probably required for us to downgrade our production app back to ef core 6.

We do something similar to (this is pseudo code, I simplified a more complicated query):

var qAgg = from x in db.B
           group x by new { x.Process.Element.Customer.Kind }
           into g
           select new
           {
               Kind = g.Key.Kind,
               Region1Rule1 = g.Count(x => x.Region1 && x.Process.Rule == Rule.Rule1),
               Region2Rule1 = g.Count(x => x.Region2 && x.Process.Rule == Rule.Rule1),
               Region3Rule1 = g.Count(x => x.Region3 && x.Process.Rule == Rule.Rule1),
               Region4Rule1 = g.Count(x => x.Region4 && x.Process.Rule == Rule.Rule1),
               Region1Rule2 = g.Count(x => x.Region1 && x.Process.Rule == Rule.Rule2),
               Region2Rule2 = g.Count(x => x.Region2 && x.Process.Rule == Rule.Rule2),
               Region3Rule2 = g.Count(x => x.Region3 && x.Process.Rule == Rule.Rule2),
               Region4Rule2 = g.Count(x => x.Region4 && x.Process.Rule == Rule.Rule2),               
           };

On EF Core 6 this used to be translated to something like (we're using postgres)

SELECT 
    e.kind AS "Kind",
    COUNT(CASE WHEN b.region1 AND (z1.rule = 1) THEN 1 END)::INT AS "Region1Rule1", 
    COUNT(CASE WHEN b.region2 AND (z1.rule = 1) THEN 1 END)::INT AS "Region2Rule1", 
    COUNT(CASE WHEN b.region3 AND (z1.rule = 1) THEN 1 END)::INT AS "Region3Rule1",
    COUNT(CASE WHEN b.region4 AND (z1.rule = 1) THEN 1 END)::INT AS "Region4Rule1",
    COUNT(CASE WHEN b.region1 AND (z1.rule = 2) THEN 1 END)::INT AS "Region1Rule2", 
    COUNT(CASE WHEN b.region2 AND (z1.rule = 2) THEN 1 END)::INT AS "Region2Rule2", 
    COUNT(CASE WHEN b.region3 AND (z1.rule = 2) THEN 1 END)::INT AS "Region3Rule2",
    COUNT(CASE WHEN b.region4 AND (z1.rule = 2) THEN 1 END)::INT AS "Region4Rule2"
FROM b
INNER JOIN z /* ...*/
INNER JOIN e /* ...*/
INNER JOIN z1 /* ...*/
GROUP BY e.kind

On EF Core 7 every single COUNT is translated to a subquery, resulting in a huge and repetitive query that takes forever to complete.

If there is any more info required I'm happy to work on this further. Providing my production example is not feasible because the queries are huge and customer-specific. I hope that my simplified examples together with @alexb5dh's explanation and example code provide enough info to further investigate.

alexb5dh commented 1 year ago

After some additional search through EF issues, this one seems to be tightly related: https://github.com/dotnet/efcore/issues/29593. Workaround from there also seems to work for now - the main trick is to add Distinct to "cheat" new EF behaviour.