dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.65k stars 3.15k forks source link

Multiple non-related dbset counts without filter executes as separate queries #27728

Open joakimriedel opened 2 years ago

joakimriedel commented 2 years ago

Including multiple non-related dbset counts in one query will perform multiple SQL queries if the dbsets are not filtered by a dummy where clause. AsSingleQuery does not help. I would prefer not having to remember to add the .Where(x => true) part to prevent multiple queries.

See full gist here.

Include your code

                // info: Microsoft.EntityFrameworkCore.Database.Command[20101]
                //     Executed DbCommand (4ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
                //     SELECT COUNT(*)
                //     FROM [Users] AS [u]
                // info: Microsoft.EntityFrameworkCore.Database.Command[20101]
                //     Executed DbCommand (1ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
                //     SELECT COUNT(*)
                //     FROM [Animals] AS [a]
                // info: Microsoft.EntityFrameworkCore.Database.Command[20101]
                //     Executed DbCommand (8ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
                //     SELECT TOP(1) 1
                //     FROM [Users] AS [u]

                var bad = await context.Users.Select(u => new {
                    userCount = context.Users.Count(),
                    animalCount = context.Animals.Count()
                }).FirstAsync();

                // info: Microsoft.EntityFrameworkCore.Database.Command[20101]
                //       Executed DbCommand (13ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
                //       SELECT TOP(1) (
                //           SELECT COUNT(*)
                //           FROM [Users] AS [u0]) AS [userCount], (
                //           SELECT COUNT(*)
                //           FROM [Animals] AS [a]) AS [animalCount]
                //       FROM [Users] AS [u]

                var good = await context.Users.Select(u => new {
                    userCount = context.Users.Where(x => true).Count(),
                    animalCount = context.Animals.Where(x => true).Count()
                }).FirstAsync();

Include provider and version information

EF Core version: 6.0.3 Database provider: Microsoft.EntityFrameworkCore.SqlServer Target framework: .NET 6.0

smitpatel commented 2 years ago

context.Users.Count() can be evaluated in advance since it is not correlated with range variable in the query so rather than letting server compute count for each row, we compute it and send the computed value to server (as a parameter). Since Where is a queryable operator, regardless of the lambda inside it, it blocked the evaluation for us and query being combined together in single one.

joakimriedel commented 2 years ago

@smitpatel for me as a user this is a very confusing default, since it fires a lot of extra queries to the sql server. Just because it can be evaluated in advance, I do not agree it is something ef core should do. The extra queries add latency and hurts performance. When would it ever be a good thing to evaluate something like this in advance?

In my real world example I found this when I tried to combined a query with 20+ counts, where only a few of them are filtered. Expected a single query to the server and instead got this crazy burst of requests.

I would expect everything rooted on the same context to be inlined. If I wanted to do extra roundtrip to the sql server I would have assigned a variable separately and included in the query such as

var userCount = await context.Users.CountAsync();
var bad = await context.Users.Select(u => new {
                    userCount = userCount,
                    animalCount = context.Animals.Count()
                }).FirstAsync();

Would it be possible to reconsider this for 7.0?

smitpatel commented 2 years ago

Can you share actual query? Looking at above query, it doesn't really make much sense to run it that way. You are counting number of users and not selecting any of data from user but just picking first (that means making sure there is at least 1 user) and then just putting count in that one result. Running FirstAsync query against the server is not useful in this case.

joakimriedel commented 2 years ago

@smitpatel It doesn't matter what table I start off with, I just pick one that I know will contain at least one row. The real query looks like this

            var counts = await _context.Users.Select(u => new
            {
                offerCount = _context.Offers.Where(x => true).Count(),
                notificationCount = _context.Notifications.Where(x => true).Count(),
                messageCount = _context.Messages.Where(x => true).Count(),
                onlineCount = _context.HubConnections
                    .Where(HubConnection.IsOnline)
                    .Select(hc => hc.UserId)
                    .Distinct()
                    .Count(),
                contactCount = _context.Contacts.Where(x => true).Count(),
                // .. +15 other aggregations
            }).AsSingleQuery().FirstAsync();

Purpose is just to get some totals to show in admin pages for an application. The FirstAsync is just to not get more than one row of data, since each row will contain exactly the same information.

Even better would be to have some kind of possibility to do one-off queries without a root to project as a single anonymous or dto object, so that I could write something like

var counts = await _context.UnrootedQuery(q => new {
    rowCount = _context.SomeDbSet.Count()
});

but that is actually a bit out of scope for this issue, I just would like ef core to ENSURE that all aggregations will be inlined in a single sql server query. (without the ugly .Where(x => true) hack because if I forget it anywhere that aggregation will be run out-of-query and add a roundtrip, which is what I hoped the AsSingleQuery would prevent in this case)

ajcvickers commented 2 years ago

Query batching (Tracked by #10879) would allow this to be done using LINQ in a relatively clear form with a single roundtrip. Until then, or if a specific form of SQL is required, the query could be written using FromSql.

@joakimriedel Is the issue here that you want a single round trip, or is it important that this be a single query?

joakimriedel commented 2 years ago

@ajcvickers I haven't done any benchmarking, but I would assume the latency from roundtrips is worse than the additional query time from running 20+ simple scalar queries instead of one larger combined query where the columns do not relate to each other. Perhaps you have some better estimates on this for the work done on #10879 ?

Still, I think the current implementation where context.Users.Select(u => new { c = context.Animals.Count() }) generates a totally different query, with additional roundtrip, than context.Users.Select(u => new { c = context.Animals.Where(x => true).Count() }) is a bug, not a feature. I could have a perfect query plan, optimized and all, and then decide to skip the Where for some reason, and suddenly EF Core would add an extra roundtrip and totally change the generated query where the value would be sent as a parameter instead. It is illogical. Is this a complex and expensive code change for you?

roji commented 2 years ago

Batching two SELECTs in a single roundtrip is likely to yield very similar performance to having the two selects as subqueries in a single SELECT query, i.e.:

SELECT COUNT(*) FROM [Users];
SELECT COUNT(*) FROM [Animals] AS [a];

... is very likely roughly equivalent to:

SELECT TOP(1) (
    SELECT COUNT(*)
    FROM [Users] AS [u0]) AS [userCount], (
    SELECT COUNT(*)
    FROM [Animals] AS [a]) AS [animalCount]
FROM [Users] AS [u]

(this is quite trivial to check). Batching two selects also has the advantage of possibly reusing query plans when the same SELECT COUNT(*) ... elsewhere (IIRC this isn't the case with SQL Server semicolon-based batching, but would be the case with DbBatch or on PostgreSQL).

So I do believe #10879 is the right general solution to this. I agree it's odd for us to do roundtrips only when Where is omitted, but whether that's worth fixing is an implementation complexity question (so @smitpatel).

ziaulhasanhamim commented 2 years ago

This a bit out of scope here. But it looks confusing for me.

var good = await context.Users
    .Select(u => new 
    { 
        userCount = context.Users.Where(x => true).Count(), 
        animalCount = context.Animals.Where(x => true).Count() 
    })
    .FirstAsync();

Because we are using Select on Users DbSet but not using anything from that. It's tricky. So isn't it better to have a Query or Select method on context it self.

var good = await context.Query(ctx => new 
{ 
    userCount = ctx.Users.Where(x => true).Count(), 
    animalCount = ctx.Animals.Where(x => true).Count() 
 })
    .ExecuteAsync();

This is more cleaner and easier to understand.

roji commented 2 years ago

We've discussed something like context.Query() in other contexts (e.g. for top-level aggregate methods, see #28264).

@dotnet/efteam do we already have an issue tracking this?

ajcvickers commented 2 years ago

@divega Do you know if you ever created an issue for having a general-purpose context.Query method that could execute any query? I'm sure there must be/have been an issue, since it was something we discussed a lot, but I can't find one!

ajcvickers commented 1 year ago

See #29484.