dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.74k stars 3.18k forks source link

Relational: Support translating GroupBy() to SQL #2341

Closed divega closed 7 years ago

divega commented 9 years ago

Updated as of 3/14/2018 based on implementation introduced

LINQ's GroupBy() operators can sometimes be translated to SQL's GROUP BY clauses, in particular when aggregate functions are applied in the projection.

Scope for 2.1 release

Our current intention is for the scope of the work in 2.1 to improve how LINQ's GroupBy is evaluated in this particular scenario:

Grouping on a simple expression that references a single column or multiple columns (using an anonymous type) of the query root and then projecting any aggregate operation from (Count, Sum, Average, Min, Max) and/or any individual property that is part of the grouping key (this will be translated to GROUP BY in SQL)

What is supported in 2.1.0-preview2

Apart from what is supported in 2.1.0-preview1 (details below), we have also added some more patterns

Examples

// Grouping by constant or a variable from closure
db.Orders.GroupBy(o => 2).Select(g => g.Count());
var a = 5;
db.Orders.GroupBy(o => a).Select(g => g.Count());

// Grouping by scalar properties from reference navigations
db.Orders.GroupBy(o => o.Customer.City).Select(g => g.Count());

// Ordering by Key/Aggregate after GroupBy
db.Orders.GroupBy(o => o.CustomerID).OrderBy(o => o.Count()).Select(g => new { g.Key, Count = g.Count() });

// Filtering on Key/Aggregate after GroupBy (Translates to Having clause in SQL)
db.Orders.GroupBy(o => o.CustomerID).Where(o => o.Count() > 0).Select(g => new { g.Key, Count = g.Count() });

// Projecting aggregate into nominal type
db.Orders.GroupBy(o => o.CustomerID).Select(g => new CustomerCountInfo { g.Key, Count = g.Count() });

And a few bugfixes - #11218 #11157 #11176

What is supported in 2.1.0-preview1


// Grouping by single column projecting aggregate or key
db.Orders.GroupBy(o => o.CustomerId).Select(g => g.Count());
db.Orders.GroupBy(o => o.CustomerId).Select(g => new { CustomerId = g.Key, Count = g.Count() });

// Grouping by multiple columns (using anonymous type) projecting aggregate or key or nested key
db.Orders.GroupBy(o => new { o.CustomerId, o.OrderDate }).Select(g => g.Sum(o => o.Cost));
db.Orders.GroupBy(o => new { o.CustomerId, o.OrderDate })
  .Select(g => new { CustomerId = g.Key.CustomerId, Sum = g.Sum(o => o.Cost) });
db.Orders.GroupBy(o => new { o.CustomerId, o.OrderDate })
  .Select(g => new { Key = g.Key, Sum = g.Sum(o => o.Cost) }); // Where Key will be anonymous object

// Grouping after complex query root
(from o in db.Orders.Where(o => o.OrderID < 10400).OrderBy(o => o.OrderDate).Take(100)
    join c in db.Customers.Where(c => c.CustomerID != "DRACD" && c.CustomerID != "FOLKO").OrderBy(c => c.City).Skip(10).Take(50)
        on o.CustomerID equals c.CustomerID
    group o by c.CustomerID)
.Select(g => new { g.Key, Count = g.Average(o => o.OrderID) });

db.Orders.OrderBy(o => o.OrderID)
    .Skip(80)
    .Take(500)
    .GroupBy(o => o.CustomerID)
    .Select(g => g.Max(o => o.OrderID));

// All above examples have group of entity types after GroupBy

// Selecting Group of anonymous types containing multiple columns
db.Orders.GroupBy(o => o.CustomerId, new {o.OrderDate, o.Price}).Select(g => g.Sum(t => t.Price));

Scenarios that we are not planning to improve in the 2.1 release

~1. Grouping on constants~ (available in 2.1.0-preview2)

  1. Grouping on an entity (e.g. a reference navigation property)
  2. Projecting non-aggregate scalar subqueries after grouping, e.g. FirstOrDefault()
  3. Making groups of multiple entityTypes using anonymous types.
  4. Using Key/Aggregate values after GroupBy in joins (#10012)

All the scenarios above present different variations depending on what happens after the GroupBy, e.g. is there an aggregate function or not, is the key mentioned in the projection or not, etc. These scenarios will still result in client evaluation.

We would appreciate if customers that care about EF Core supporting any of those scenarios that are scoped out from 2.1 to create individual issues for them, up-vote them, and keep the discussion there.

alexvaluyskiy commented 9 years ago

I have these classes

public class Blog
{
    public int BlogId { get; set; }
    public string Url { get; set; }

    public List<Post> Posts { get; set; }
}

public class Post
{
    public int PostId { get; set; }
    public string Title { get; set; }
    public string Content { get; set; }

    public int BlogId { get; set; }
    public Blog Blog { get; set; }
}

If I use this code

var query1 = from p in context.Posts
             where p.BlogId == 1
             group p by p.Title into g
             select new
             {
                 Title = g.Key,
                 Count = g.Count()
             };

EF7 generates this SQL

SELECT [p].[PostId], [p].[BlogId], [p].[Content], [p].[Title]
FROM [Post] AS [p]
WHERE [p].[BlogId] = 1

without grouping and projection and groups data on the client side

If I use Group by with Inner Join with this code

var query = from p in context.Posts
            join b in context.Blogs on p.BlogId equals b.BlogId
            where p.BlogId == 1
            group p by p.Title into g
            select new
            {
                Title = g.Key,
                Count = g.Count()
            };

I get this error

{"Unable to cast object of type 'Microsoft.Data.Entity.Query.QueryResultScope`1[<>f__AnonymousType1`2[EF7Test.Post,EF7Test.Blog]]' to type 'Microsoft.Data.Entity.Query.QueryResultScope`1[EF7Test.Post]'."}

EF 7.0.0-beta6-13735

rowanmiller commented 9 years ago

Consider these cases when translating SQL

leus commented 9 years ago

Just another test case:

public class Categoria
{
    public int Id { get; set; }
    public int? CategoriaId { get; set; }
    public string Nombre { get; set; }
}

var q = from c in dbContext.Categorias
                join cc in dbContext.Categorias on c.CategoriaId equals cc.Id
                group c by c.CategoriaId into g
                select new { Id = g.Key.Value, total = g.Count() };

Throws the following exception:

{"Unable to cast object of type 'Microsoft.Data.Entity.Query.QueryResultScope`1[<>f__AnonymousType3`2[nyp.DataModels.Categoria,nyp.DataModels.Categoria]]' to type 'Microsoft.Data.Entity.Query.QueryResultScope`1[nyp.DataModels.Categoria]'."}

This is for a recursive tree-like structure. Here I'm just trying to count children.

rowanmiller commented 8 years ago

@anpete should this go to backlog for now?

anpete commented 8 years ago

@rowanmiller Sounds good.

YehudahA commented 8 years ago

Why it's go to backlog?! It's critical for for information and reports systems. For example, I have 2 million records and I need to show sales by customer. Grouping in client side, is impossible!

ErikEJ commented 8 years ago

View/ Indexed view?

rowanmiller commented 8 years ago

Or a raw SQL query. It's not that we aren't going to support it, just that we don't have time to get this translating (and stabilize it) before the 7.0.0 release. This one will be one of the first things we tackle after that.

rjperes commented 8 years ago

What would be the recommended approach using SQL (FromSql)? The problem I see is that we need an entity to materialize the records into, and I think it would need to be mapped.

divega commented 8 years ago

@rjperes that would require a feature we are planning to add in 1.1.0: the ability to boostrap queries using arbitrary types: #1862.

chris-oswald commented 7 years ago

@rowanmiller I was hoping this was a 1.1 milestone, but I don't see this in 1.2 either. Any idea when this will get pulled off the backlog? I know we have work-arounds in the mean time.

rowanmiller commented 7 years ago

@chris-oswald not yet, we are planning for the next release this week.

simeyla commented 7 years ago

It's good to at least see it is listed as item #2 in the Critical O/RM features section.

Even if all situations cannot be accounted for - I hope we are going to at least get the simplest cases covered.

And to be fair @chris-oswald there is literally nothing in the current 1.2 feature list - you scared me for a moment!

tuespetre commented 7 years ago

Heads up: I have made a lot of headway in this area. Expect a PR soon 😸

tuespetre commented 7 years ago

Checking in:

I've been using almost all of my free time to work on this and I'm just fixing some remaining failing tests. I've learned a lot about the codebase during this time and as I've worked on things, I've both inadvertently and purposefully introduced a number of other optimizations like LEFT JOIN LATERAL/OUTER APPLY, more compact SQL queries as a result of improving the 'query source requires materialization heuristics', and more. I just don't want anyone to think maybe I fell off of working on this πŸ˜‰

chris-oswald commented 7 years ago

Derek @tuespetre I work for a data warehouse company and we do tons for query performance tuning. If you ever want any help in query optimization from a SQL Server perspective please reach out to my Git account. Happy to conference call.

simeyla commented 7 years ago

@Derek Excited :-) Since you mention LEFT OUTER - can you check on this issue at some point: http://stackoverflow.com/questions/40544642/efcore-returning-too-many-columns-for-a-simple-left-outer-join I never got around to formally reporting it.

In summary of the issue : LEFT OUTER with a simple Select projection would end up returning all columns from the DB instead of just the ones in the projection.

Thanks!

On Mon, Jan 23, 2017 at 9:13 AM, Derek Gray notifications@github.com wrote:

Checking in:

I've been using almost all of my free time to work on this and I'm just fixing some remaining failing tests. I've learned a lot about the codebase during this time and as I've work on things, I've both inadvertently and purposefully introduced a number of other optimizations like LEFT JOIN LATERAL/OUTER APPLY, more compact SQL queries as a result of improving the 'query source requires materialization heuristics', and more. I just don't want anyone to think maybe I fell off of working on this πŸ˜‰

β€” You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/aspnet/EntityFramework/issues/2341#issuecomment-274551867, or mute the thread https://github.com/notifications/unsubscribe-auth/ADfKVnucT5t8ePDk0Ph5aJLDca-6tehOks5rVN_NgaJpZM4E6HAK .

tuespetre commented 7 years ago

@simeyla the changes I've been making will address many such scenarios.

tuespetre commented 7 years ago

Okay... I have all bajillion (as a compliment) tests passing, now I'm going to rebase onto the last month or so of commits that have been merged to dev since I started before submitting a PR.

YehudahA commented 7 years ago

@tuespetre, A lot of success! Many guys are waiting for it.

tuespetre commented 7 years ago

7543

langdonx commented 7 years ago

@tuespetre Excellent, very exciting! Can you elaborate on Complex GroupBy keys still needing to be covered?

Does this mean the first will happen on the database, but the second will still happen in code after selecting the necessary records?

from p in context.Employee
group p by p.Title into g
select new
{
    g.Key,
    count = g.Count()
};
from p in context.Employee
join t in context.Title on p.TitleId equals t.Id
group new { p, t } into new { p.Age, t.Title } into g
select new
{
    g.Key.Age,
    g.Key.Title,
    count = g.Count()
};
tuespetre commented 7 years ago

@langdonx something like that, I've already got the right GROUP BY statement generating for that case, but I think the projection currently won't work. I'm pretty sure of what needs to be done, I just need to go back in and write the tests and so forth now (simple keys, single-level complex keys, multi-level complex keys, keys with functions, keys with binary expressions, keys with coalesce, etc., etc.)

danielgreen commented 7 years ago

Hi @tuespetre Eagerly awaiting the #7543 pull request being merged! Kudos to you for all your work.

Please can I check if this scenario will be included in the SQL translation improvements, so I don't create a duplicate issue:

var grandTotalsData = await dbContext.Expenses.GroupBy(e => 0).Select(g => new
{
    Count = g.Count(),
    GrossAmountTotal = g.Sum(e => e.GrossAmount),
    GrossAmountAverage = g.Average(e => e.GrossAmount),
}).SingleAsync();

The GroupBy(e => 0) expression uses a constant to indicate that I want the entire table in one group. I'm hoping this will generate a single query with multiple aggregates:

select count(*) as Count, sum(GrossAmount) as GrossAmountTotal, avg(GrossAmount) as GrossAmountAverage
from Expenses

At the moment it generates warnings that GroupBy, Count, Sum, Average and Single cannot be translated. I've also seen the same warning about DefaultIfEmpty.

When used in isolation, Count and Sum are translated to SQL, but not in the context of a GroupBy like this. Average, however, doesn't appear to translate at all.

Thanks

tuespetre commented 7 years ago

@danielgreen you can see some of the tests and sample generated SQL here:

https://github.com/tuespetre/EntityFramework/blob/52561bdb44a26e70b1b939b33af0157275b3ebdc/test/Microsoft.EntityFrameworkCore.SqlServer.FunctionalTests/QuerySqlServerTest.cs#L6890

I'm currently working to break down #7543 as much as possible per @smitpatel / @maumar's request.

tuespetre commented 7 years ago

@langdonx we'll see how this one goes: #7697

datasecurity commented 7 years ago

        query = query.Where(e => e.Goodscode.Contains(filter) || e.GoodsName.Contains(filter));    
        return query.Select(e => new GoodsDTO    
            {
                Name = e.GoodsName,
            Code = e.Goodscode,
            T3Code = e.T3Code,
            StockId = e.StockId
        }).Take(100).ToList().Distinct(new GoodsDTOComparer()).Take(20); 
        //why we do like up: because EF7 dosent support Distinct and GroupBy yet (12-03-2017)
        //why i use DotNetCore for a reporting system? :( 
YZahringer commented 7 years ago

The support of GroupBy translation to SQL is not planned for the release 2.0.0?

ethanli83 commented 7 years ago

Hi guys, I am working on a project which translates almost all EF lambda expressions into SQL to avoid in-memory execution. Maybe you guys can have a look and see if it is helpful.

Here is the link to the project: https://github.com/ethanli83/EFSqlTranslator

You can also try it out here: http://linqrunner.daydreamer.io/

andymac4182 commented 7 years ago

It would be great to understand when this will be coming as it is one of our largest issues with moving to EF Core for a large part of our stack.

xrkolovos commented 7 years ago

Huge issue, that i expected to see in build...

andymac4182 commented 7 years ago

@smitpatel Any chance of seeing this in EFCore 2.0?

smitpatel commented 7 years ago

@andymac4182 - This is a decent amount of work to implement in terms of how the current query infrastructure is. Sadly, as Roadmap suggests, it will not be in 2.0. There is just not enough time to add this feature and stabilize it.

andymac4182 commented 7 years ago

Does that mean we will possibly see this in 2.x? or will it be a 3.0 release sort of time frame? Considering how much we use EF this is a large blocker to us moving to Core.

andymac4182 commented 7 years ago

Also that Roadmap really confuses me since you are releasing features not part of the "Critical" list before completing what you yourselves have deemed as critical.

SilverioMiranda commented 7 years ago

Two years and almost two versions later this basic function has not yet been implemented nor is it in the priorities. The EntityFramework Core should still be in beta and not going to version 2.0

rweads0520 commented 7 years ago

Very hard to believe this isn't a higher priority. Hoping someone finds the time to grab this one and get it done. Looking through this thread, looks like a good attempt was made, but the PR ended up being too broad and wasn't pulled in. Even just some level of support for simple grouped queries and common aggregate functions would be better than nothing. There shouldn't be any big inventions here, these translations have already been done by several open source ORMs.

tuespetre commented 7 years ago

@rweads0520 it is kind of hard to fit it into the existing codebase. I think I hate Relinq, to be honest. I gave up and started a side project to prove to myself that I could translate all the LINQ operators including hardcore stuff like GroupBy, Zip, and so forth using only expression visitors and expressions, and I'm certain took me less time than it did to do the work I did for those PRs.

smithkl42 commented 7 years ago

@tuespetre - Which is kind of a shame, because as I understand it, the whole point of EFCore was originally to make it easier to do stuff that the EF6 codebase made difficult. And it sounds like it's actually worse in some scenarios than EF6.

Given that critical stuff like this still isn't supported, is there any reason to move to EFCore from EF6? I totally get that not every EF6 scenario will be supported by EFCore, but this is, well, kind of core, you know? (Not blaming you, obviously - just wondering what the rest of the EFCore team has to say about how this feature has gotten dropped.)

mcnallys commented 7 years ago

@tuespetre Don't give up on your PR I feel you were so close!

Ben-Pattinson commented 7 years ago

Mind-boggling that they got out of beta without a working Group By functionality. Every time I tangle with EF I end up lowering my expectations. I keep thinking I know how limited it is and then it gets worse. Personally I think this kind of limitation ends up putting the whole Core project into Beta status. It's interesting, but if you don't have a solid data access technology you can use, it's ultimately pointless.

Just spent a week converting a project to .net Core as an experiment. Looks like that won't be going any further for a while.

@tuespetre is the side project you mentioned publicly available?

JohnYoungers commented 7 years ago

We use EF6 (and OData) extensively in our projects and it's worked out great.. Some of these responses are a bit much

Similar to @Ben-Pattinson, with the release of everything-2.0 I've been looking into getting our libraries converted over to .Net Standard 2.0 which includes converting over to this. Not having GroupBy is obviously going to be a blocker, but if it does end up in 2.1 as specified then the timing might work out great.

eriksendc commented 7 years ago

@rowanmiller who is accountable for following up on whether this gets into 2.0 or some release at some point? Just hoping that a more definitive answer can come sooner than later. I get that this had to be tabled for the very first release, but it is two years later now... But I do understand that there are always lots of constraints. I'm just asking for some kind of realistic plan to be made and then disseminated to us all. Thanks!

ErikEJ commented 7 years ago

@eriksendc If you take a look at the roadmap you will see that this is planned for 2.1

eriksendc commented 7 years ago

@ErikEJ Oh thanks and so sorry for missing that. Super great! :) πŸ‘

jtheisen commented 7 years ago

Out of curiosity (as I'm playing with query building in TypeScript) - could @divega explain what he means by

LINQ's GroupBy() operators can sometimes be translated to SQL's GROUP BY clauses...

What kind of GroupBy() can't be translated to an SQL GROUP BY?

tuespetre commented 7 years ago

@jtheisen really weird ones; the hardest thing is usually not the grouping key, but translating things that are done with the grouping afterwards. Things like doing a SelectMany from the groupings, taking the first item from each grouping, querying navigation properties off of the grouping's contents, using navigation properties within aggregations, and so forth.

jtheisen commented 7 years ago

@tuespetre Hmm, I would think that in those cases there's always some naive approach like this:

SELECT
    (SELECT SUM(s.value1) FROM source WHERE s.key = k.key) sum, -- aggregation
FROM (SELECT DISTINCT s.key key FROM source s) k
OUTER APPLY (SELECT TOP 1 * FROM source s WHERE s.key = k.key ORDER BY ...) fr -- first row

Maybe you have to do more than one of those and join the results together.

Of course it's a bad query and I see that making good queries may be a lot harder though.

But if you have something more concrete that is really difficult to express in SQL at all, even with sub-optimal queries, I'd really love to see it.

tuespetre commented 7 years ago

@jtheisen a good example would be trying to pull a value out of a grouping that comes from a joined subquery. I mean, you won't run into it very often if at all, but it has to be considered.

But then we were talking about GroupBy to GROUP BY, not OUTER APPLY and ahem FOR JSON :trollface:

smitpatel commented 7 years ago

@jtheisen - The simplest example of non-translatable group by would be this var ordersForCustomer = db.Orders.GroupBy(o => o.CustomerId) Here you are grouping orders by customerId and getting IGrouping<string,Order>. There is no way to represent a group of orders in SQL in single row.

The SQL GROUP BY syntax has restriction in what can appear in projection. It is restricted to, either column has to be specified in GROUP BY or it has aggregate operation applied on any column. In its simplest (and straight-forward) form, all GroupBy queries which has aggregate operator applied on grouping can be translated to GROUP BY in SQL.