Closed divega closed 7 years ago
@tuespetre Can you write the linq for that one down? Not trying to be annoying, just genuinely interested to learn about a problem if it indeed exists.
@smitpatel That's a different problem altogether that has nothing to do with grouping. Just take this query using a navigational property:
from o in Orders select { Order = o, Invoices = o.Invoices }
Here EF has can't select the orders in one row each either. All ORMs flatten the query and re-create the graph structure after the load.
@jtheisen still, that is exactly what I meant: LINQ's GroupBy
and SQL's GROUP BY
are not the same thing, and only a subset of the queries in which you use GroupBy
in LINQ can be translated to a query using GROUP BY
in the SQL.
The examples provided by @smitpatel and @tuespetre don't fall into that bucket.
@divega Well, yes, they are not the same thing. You can't always translate a GroupBy
with just a GROUP BY
.
Under that light I read your OP as asking whether EF Core should support GroupBy
s only in those cases where just the simple GROUP BY
in the translation was enough.
I didn't read it that way because that would mean to aim really low compared to EF6.
@jtheisen fair point.
@smitpatel could you help us (as part of the work you are doing on this) come up with a list of the different scenarios in which GroupBy can be translated to something that has advantages over client evaluation, so that we can make an explicit decision on priorities?
I think it would be fine to track them all here, although creating separate issues would make things easier if we end up assigning very different priorities to them.
Hi @divega
Here is one scenarios:
var ordersForCustomer = db.Orders
.GroupBy(o => o.CustomerId)
.Select(g => new {
CustomerId = g.Key,
Count = g.Sum(o => o.TotalCost)
);
This query gives you total cost of orders made by each customer. If you do this in memory using Linq, it will need to load all the data into memory and then aggregate. Translating it in SQL and execute it will be much faster as there is much less data to load, and db indices will be leveraged too.
@ethanli83 thanks. That one is actually well understood. We are talking about the ones that require APPLY, e.g. because something that isn’t an aggregate function is projected, and potentially others.
@divega I think one really important group to fully support would be those created by data sources for data grids. DevExpress and Telerik have data grids you can bind to IQueryable
s that also create GroupBy
s and/or distinct
for their requests in three cases:
Those aren't requiring APPLY
though, so that should be an easy and very important subset.
hi @divega, so maybe something like this?
// Group on an enity rather than columns, and then access column on the group key.
db.Posts
.GroupBy(p => new { Blog = p.Blog })
.Select(g => new { UserId = g.Key.Blog.User.UserId })
or this?
// Group on aggregation
db.Blogs
.Where(b => b.Url != null)
.GroupBy(b => new { Cnt = b.Posts.Count(), Avg = b.Posts.Average(p => p.LikeCount) })
.Select(x => new { Cnt = x.Key.Cnt, Avg = x.Key.Avg, CommentCount = x.Sum(b => b.CommentCount) })
After that I believe getting the first item of something comes up frequently:
from o in orders select new { Order = o, LatestInvoice = o.Invoices.OrderBy(...).FirstOrDefault() }
That requires APPLY
.
@jtheisen stepping back a little, the translation you previously proposed to one of @tuespetre’s examples uses DISTINCT + OUTER APPLY, and given that DISTINCT can be considered just a syntax shortcut for a GROUP BY, it does belong to the GROUP BY + APPLY “class” i was referring to.
Re GroupBy() queries generated by grids, can you provide examples and explain how they can benefit from processing on the server other than generating ORDER BY in the translation?
@divega My OUTER APPLY
query was as if it came from something like
from o in orders select new {
Order = o,
NumberOfInvoices = o.Invoices.Count(),
LatestInvoice = o.Invoices.OrderByDescending(...).FirstOrDefault()
}
If it's invoices and orders it probably doesn't matter much because orders don't have many invoices.
But it would matter if the associated collection can contain, on average, many items for the reason @ethanli83 stated in his second last comment.
APPLY
s may not be the most pressing issue of all though - although I would say that if EF Core doesn't aim to do that eventually, that would be a bit sad. (Btw, I put a question on SO about what the eventual aim is of the EF Core team in terms of query translation sophistication - it would be awesome if the right person could say something to that.)
Re the kind of queries that grids generate: from reading one of the issues linked here, it seems the produce separate queries with aggregates for summaries. This should fall into simple GROUP BY + aggregate class.
@divega The second question regarding how server processing benefits was again addressed by @ethanli83 comment I referred to in my last comment - and in that case it's really an important issue.
I have a number of grids in LOB applications that if they'd do grouping client-side, they'd probably choke.
@divega Absolutely. It's the simpler case.
Thanks @jtheisen for mentioning data grids. In DevExtreme we use this pattern for building dynamic queries:
data
.Where(...) // optionally filter
.GroupBy(...) // group by a number of keys
.OrderBy(...).ThenBy(...) // order by keys
.Select(/* keys and aggregates */)
One special case we use extensively is grouping by the empty key (new { }
) to fetch multiple aggregates for the whole dataset in one go. Some time ago @jtheisen reported the issue https://github.com/aspnet/EntityFramework6/issues/34 which occurs when the empty key is used. Please consider it in EF Core.
@divega
Re GroupBy() queries generated by grids, can you provide examples and explain how they can benefit from processing on the server other than generating ORDER BY in the translation?
Those are cases when the grid is used as a simple analytic tool. An end user applies filtering and grouping and then explores groups+aggregates without preloading individual data items.
It would be great to support the case with grouping by a constant. It's similar to the case that was mentioned by @AlekseyMartynov. We also use such a query to fetch multiple aggregates. The query is dynamically generated, so it would be hard to change it to something else.
var query = context.Orders.Where(o => (o.OrderId > 0))
.GroupBy(Param_0 => 0)
.Select(Param_1 => new {
P0 = Param_1.Key,
P1 = Param_1.Count(),
P2 = Param_1.Sum(elem => elem.Freight),
P3 = Param_1.Max(elem => elem.ShipName),
P4 = Param_1.Min(elem => elem.ShipAddress)
});
Also, I would like to note that there were problems with such a query via EntityFramework 6 with third party providers like MySQL and Oracle.
@DunkanDX & @AlekseyMartynov,
Does it makes more sense to support translating things like Math.Sum rather than using GroupBy this way? Something like:
var query = context.Orders.Where(o => (o.OrderId > 0))
.Select(o => new {
Sum = Math.Sum(e => e.Freight),
Max = Math.Max(e => e.ShipName),
Min = Math.Min(e => e.ShipAddress)
});
Which can simply be translated into:
SELECT sum(o.Freight) Sum, max(o.ShipName), min(o.ShipAddress)
FROM Orders o
WHERE o.OrderId > 10
@ethanli83 The same query with grouping by constant works well against many kinds of Linq-sources including EntityFramework 4, 5 and 6. As far as I understand your query will work only against EFCore. It won't work even for Linq To Object. Thus, in my opinion, it better to support common scenario.
@DunkanDX, you are right. Also, I think we do not want thing that does not work as normal linq. thx
In past few days, there is discussion going on around what queries fall into the bucket of GROUP BY
. Since Linq allows a large variety of combination of operators, the amount of various queries with GroupBy
operator is unbounded. As a long term goal of EF Core, we would want to translate queries to server as much as possible unless there is bad perf impact. But that is eventual goal. Trying to translate an unbounded set of queries to server in single release is kinda unrealistic goal.
If we just discuss about GroupBy case, linq GroupBy
& SQL GROUP BY
are not same thing. While GroupBy
operator would allow a lot of different operations regarding, grouping key and element selector, GROUP BY
clause in SQL is much more restricted in terms of what you can exactly represent. As various post in above discussion said, at times GroupBy would translate to OUTER APPLY
too. Same way, we could find various translation of GroupBy
in some different shape of SQL. But covering all such translations in 2.1 release timeframe is hard to fit in goal and it may not complete.
I am removing this from 2.1.0 milestone till we figure out what are exact scenario(s) we are considering we are considering for 2.1 release. Covering all variety of GroupBy
is unreachable goal.
Is it possible to support, at least, some the translations made by EF6? I know this is kind of vague but, IMO, it is better to have some basic support to Group By
now than have a complete and full featured Group By
later. Other improvements could be introduced later.
Having this example:
var query =
from table in _context.Table
group table by table.IdParent into tables
select new
{
Id = tables.Key,
Count = tables.Count()
};
var result = query.ToList();
Right now, this basic Group By
is translated in something like select * from table.
I am not following the development of entity framework, and i really have no idea what is the complexity of doing this, but i think this kind of translations should be supported as soon as possible.
Hope you consider, at least, having a basic support for the 2.1 release.
Thanks.
@smitpatel I fear that from a management perspective removing this from the 2.1.0 milestone for the reasons you give just kicks the can down the road. Someone needs to dig in and refine the scenarios now so that it does get done in 2.1.0.
@brunoalreis "Is it possible to support, at least, the translations made by EF6?"
EF6 translates nearly everything group-by-wise.
The way I understand it is that this is that EF Core likely never will be as good as EF6, let alone in 2.1.0.
@smitpatel to provide a brief summary of what we will address in 2.1. We re-assess post 2.1 what further cases to consider and how to handle them.
Maybe by the TENTH anniversary of Linq2Sql?
https://msdn.microsoft.com/en-us/library/cc161164.aspx
I wanted to thank some of the recent feedback on this issue and also take the chance to clarify what is really going on, given that thread has turned into some speculation on our triage process, like what it means to remove the milestone of an issue, or whether EF Core's GroupBy
support is ever going to better than other O/RMs, etc.
First of all, I would like everyone to understand that we consider that improving the support of GroupBy
means a few different things, i.e. there are different scenarios for which the new translation would need to work quite differently, and we are effectively assigning different priorities to those individual scenarios. Some of them will be supported in 2.1, some of them may be supported later.
I have updated the original text of this issue to reflect exactly we are planning to support in 2.1, and to enumerate the scenarios that we have decided to scope out of 2.1, and that we are possibly going to address later, depending on feedback.
I would like to stress that customer input is one of the most important things we consider when deciding what to do in each release, but prioritizing and scoping is a big part of what all software projects need to do every day in order to make progress.
We try to prioritize things that we believe can help the largest number of customers the most, but any time we decide to implement a specific feature in a given release, we are usually postponing other features that could also help.
We just happen not to think that improving the GroupBy
support in all of the possible scenarios (i.e. have a superset of the capabilities of other products, including EF6) is a more important goal that every other feature or bug in our backlog.
Rather than repeating how the process of selecting what features go into a release here again, I would like to invite everyone to read a page we put together recently, precisely on this subject:
https://github.com/aspnet/EntityFrameworkCore/wiki/release-planning-process
@divega Thanks so much for clarifying. When @smitpatel said "I am removing this from 2.1.0 milestone till we figure out what are exact scenario(s) we are considering" that was very worrisome. It seemed like any / all GroupBy functionality was going to be off the table for 2.1. I've been a participant in many teams using different release planning processes and it just sounded dubious (to me, at least). Thanks to you and all who participated in determining what we'll be looking forward to.
@eriksendc Understood. For future reference, in our regular triage meeting the main query we look at is "issues with no milestone".
Interesting feedback from @FransBouma: DevExpress' grid creates groupby predicates on constant.
Indeed, I searched our forums for the issue, and it's from 2008, but I doubt they've changed it. Their LinqServerModeDataSource control can be bound to an IQueryable and be used in grids (winforms, webforms etc.) and they generate grouping on constant and add the g.Key to the projection. The control bound to the grid allows the user to group, sort and filter on the datasource and the control then builds a linq query with these elements. If no grouping is done, it groups on a constant, and this will then fail :)
As this is something you'd never expect to happen, and I do recall the issue quite clearly (as it was a problem to fix it properly back then), I thought I'd mention it.
@bricelam @FransBouma - I have filed #9969 to track it.
@divega Can you add to the scope of work throwing an exception if grouping on a constant, where the exception links back to #9969 ?
@jzabroski - At present grouping on constant does not throw. It would evaluate it on client. Change it to throw an exception is breaking change.
Hi. Have we had any supports about this?
This seems still not suppprted... I'm currently working on a project and facing the performance issue as it doesn't translate to select count(*), xxxx group by xxxx , it actually fetches all the data from DB and does group by locally which causes the performance issue, coz I have got a million records...
var counts = (from t in _dbContext.VoteTickets join m in _dbContext.VoteItemData on t.VoteItemId equals m.Id where m.City == city group t by m.District into grp select new { District = grp.Key, Count = grp.Count() }) .ToArray();
I think you misunderstood me. I was advocating that you define the behavior via unit test, and explained that many tools expect this behavior, and by not supporting it, upgrading EF6 to EFCore will bring unexpected surprises.
On Thu, Oct 12, 2017, 6:07 PM Smit Patel notifications@github.com wrote:
@jzabroski https://github.com/jzabroski - At present grouping on constant does not throw. It would evaluate it on client. Change it to throw an exception is breaking change.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aspnet/EntityFrameworkCore/issues/2341#issuecomment-336288838, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbT_baFfDgkb25tNZXnm1L3dQL5A73Gks5sro2ygaJpZM4E6HAK .
upgrading EF6 to EFCore will bring unexpected surprises. EF6 to EF Core is not in-place upgrade. EF Core is rewrite of EF6 and in many cases have behavioral differences. Any application expecting the group by constant to work same as EF6 is gonna fail regardless of it client eval or throw. By doing client eval, there is higher probability of external tools may just work.
@mchenx I'm affraid you won't get any joy with EF Core anytime soon. I recommend doing the simple stuff in EF Core and anything complicated in dapper or similar. It really is WAY easier than even full EF. Yes there's hard coded strings, but with string interpolation, nameof and a data layer, you can really make it robust. Give it a shot, you'll be surprised.
@Ben-Pattinson, maybe you will be interested in this project I worked on EFSqlTranslator. It translates EFCore lambda into string and executes it on Dapper.
@ethanli83, very nice! I must say, initially I was rather sceptical, having been badly burnt by EF's horrific SQL generation. Assuming you haven't cherry picked your examples of SQL generation, they are nice and readable, and moreover, sane. Very un-EF like. You'll get into trouble for stuff like that, the Union will be along in a moment to give you a good talking to :) Comprehensive examples too, keep up the good work!
Thank you @Ben-Pattinson, I will take a look at Union!~
@Ben-Pattinson SQL generation is something we have tried really hard to improve in EF Core. Do you have some examples of where you feel we can improve?
@anpete (sorry for the rambling reply) I probably should have been more specific, I was referring to full framework's EF sql. I have spent literally days trying to get full EF to output sql that did what I meant. Trying to decipher the generated sql was often a nightmare. Getting the same operation was minutes in dapper. This is why I settled on a hybrid approach. Use EF for CRUD and simple selects, dapper for anything complex. When I started porting to core, I was pleased to see the SQL is indeed much more readable, which is a great improvement.... however imho the lack of group-by support makes it a bit of a joke. The silent nature in which the group-by executes client-side is particularly nasty for the unwary. It reinforces my position of allways checking the SQL EF outputs - which the new generation does really help with. I can work round this new limitation by simply extending my dapper use, but I'm sure you can see the problems anyone not on a hybrid approach will have.
I appreciate the problems you have when you need to support the same linq syntax for many different types of query, but when personal projects are running rings around your official data layer implementation, something isn't right.
Personally, all I ever wanted was to write sql in my data layer and have it compile time validated against my data layer. It's so much easier and flexible than translating between unrelated syntaxes. I've got that now with dapper, nameof, string interpolation and t4 templates. It would have been nice to get slicker intellisense support, but it's pretty usable.
@anpete I am not @ben-pattinson but if you could describe what you do differently in generating SQL, I am a rare combination of SQL Server expert (going back to 6.5 days of Ron Soukup) and C# engineer who has built web apps, desktop apps, as well as a proprietary ORM-like data layer.
Years ago, I explained to Diego that EF6 and lower was fundamentally flawed in how it generated queries, and we (the company I was working for) even logged a bug and workaround for the most fundamental flaw in the query generator: the dreaded duplicate left join bug where the "forest" of possible left join possibilities does not reduce to a single, unique tree.
Cheers, John
On Mon, Feb 26, 2018, 7:55 PM Andrew Peters notifications@github.com wrote:
@Ben-Pattinson https://github.com/ben-pattinson SQL generation is something we have tried really hard to improve in EF Core. Do you have some examples of where you feel we can improve?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aspnet/EntityFrameworkCore/issues/2341#issuecomment-368707133, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbT_dlkAEnarCjYtZo70QO0s2HNroufks5tY1KFgaJpZM4E6HAK .
Smit Patel , your hardline stance indicates a severe code smell. You are stating a policy, whereas I'm talking about the basic mechanism. If you're hardwiring sequences of collaborations between your Domain Object Hydration Engine/Object Materializer and your Expression Transformation (Query Transpiler) Engine, then you are likely going to regret it and never even realize somebody told you you were coding it wrong on February 27th, 2018.
Decouple, decouple, decouple. Let your customers choose.
On Tue, Feb 20, 2018, 6:45 PM Smit Patel notifications@github.com wrote:
upgrading EF6 to EFCore will bring unexpected surprises.
EF6 to EF Core is not in-place upgrade. EF Core is rewrite of EF6 and in many cases have behavioral differences. Any application expecting the group by constant to work same as EF6 is gonna fail regardless of it client eval or throw. By doing client eval, there is higher probability of external tools may just work.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/aspnet/EntityFrameworkCore/issues/2341#issuecomment-367161488, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbT_bhKWA6HTDS4xfFb0VLWGRTm-NUkks5tW1kLgaJpZM4E6HAK .
@Ben-Pattinson Thanks for the feedback and glad to hear that you have come up with a solution that works for you.
A couple of responses to the points you make: LINQ GroupBy is very different from SQL GROUP BY and so we need client eval here for many cases anyway. It was a shame we weren't able to get to GroupBy with aggregates translation until 2.1, but it is almost here. One reason for this prioritization was that we have the FromSql
API, which gives you a Dapper-like experience for creating queries with raw SQL. Have you tried it? If so, it would be great if you could let us know why it is not working for you - It could be because we didn't have Query Types, which are also landing in 2.1, and make the FromSql experience much better because you no longer need an entity result type.
@anpete It's really great to finally see this working :-)
But oddly enough I still receive the error message telling me it cannot translate to GROUP BY, even though the generated SQL right underneath most certainly uses GROUP BY! I assume this must be a known issue?
Entity Framework Core 2.1.0-preview1-28290 initialized ......
warn: Microsoft.EntityFrameworkCore.Query[20500] The LINQ expression 'GroupBy(new <>f__AnonymousType1`1(OrderDateDt = [x].OrderDateDt), [x])' could not be translated and will be evaluated locally.
info: Microsoft.EntityFrameworkCore.Database.Command[20101] Executed DbCommand (33ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
SELECT [x].[OrderDateDt], COUNT(*)
FROM [Order] AS [x]
WHERE [x].[CompletedOrderId] IS NOT NULL AND ([x].[OrderDateDt] > DATEADD(day, -30E0, GETDATE()))
GROUP BY [x].[OrderDateDt]
This is from this source C#
var groupedOrders = _context.Order
.Where(x => x.CompletedOrderId != null)
.Where(x => x.OrderDateDt > DateTime.Now.AddDays(-30))
.GroupBy(x => new { x.OrderDateDt })
.Select(x => new
{
Date = x.Key.OrderDateDt,
Count = x.Count()
})
.ToList();
It returns instantly and definitely works as expected so I'm happy for now - just wanted to point this out.
@simeyla - Thanks for info. I have filed #11157 to track the issue and fix it.
For preview2 version, there are more patterns now being translated to server (including group by constant). I have updated first post to capture the details.
Please let me know if I can solve the following in 2.1.1:
I need to take the latest item from each group, but I receive warnings saying the query cannot be translated. Is there a workaround and is it going to be possible in the foreseeable future?
Example:
public class Author
{
public int AuthorId { get; set; }
public string FirstName { get; set; }
public string LastName { get; set; }
}
public class Book
{
public int BookId { get; set; }
public int AuthorId { get; set; }
public string Name { get; set; }
public DateTime CreatedAt { get; set; }
public virtual Author Author { get; set; }
}
public class EFCoreDemoContext : DbContext
{
public static readonly LoggerFactory MyLoggerFactory
= new LoggerFactory(new[] { new ConsoleLoggerProvider((_, __) => true, true) });
public DbSet<Author> Authors { get; set; }
public DbSet<Book> Books { get; set; }
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
optionsBuilder.UseLoggerFactory(MyLoggerFactory).UseSqlServer("Server=(localdb)\\MSSQLLocalDB;Database=Test;Trusted_Connection=True");
}
}
static void Main(string[] args)
{
using (var context = new EFCoreDemoContext())
{
context.Database.EnsureCreated();
// works as expected
var test1 = context.Books.GroupBy(x => x.AuthorId).Select(x => x.Count()).ToList();
// warn: The LINQ expression 'GroupBy([x].AuthorId, [x])' could not be translated and will be evaluated locally.
// warn: The LINQ expression 'orderby [b].CreatedAt asc' could not be translated and will be evaluated locally.
// warn: The LINQ expression 'Last()' could not be translated and will be evaluated locally.
var test2 = context.Books.GroupBy(x => x.AuthorId).Select(x => x.OrderBy(b => b.CreatedAt).Last()).ToList();
}
Console.ReadKey();
}
Updated as of 3/14/2018 based on implementation introduced
LINQ's
GroupBy()
operators can sometimes be translated to SQL'sGROUP BY
clauses, in particular when aggregate functions are applied in the projection.Scope for 2.1 release
Our current intention is for the scope of the work in 2.1 to improve how LINQ's
GroupBy
is evaluated in this particular scenario:Grouping on a simple expression that references a single column or multiple columns (using an anonymous type) of the query root and then projecting any aggregate operation from (Count, Sum, Average, Min, Max) and/or any individual property that is part of the grouping key (this will be translated to GROUP BY in SQL)
What is supported in 2.1.0-preview2
Apart from what is supported in 2.1.0-preview1 (details below), we have also added some more patterns
Examples
And a few bugfixes - #11218 #11157 #11176
What is supported in 2.1.0-preview1
Scenarios that we are not planning to improve in the 2.1 release
~1. Grouping on constants~ (available in 2.1.0-preview2)
FirstOrDefault()
All the scenarios above present different variations depending on what happens after the
GroupBy
, e.g. is there an aggregate function or not, is the key mentioned in the projection or not, etc. These scenarios will still result in client evaluation.We would appreciate if customers that care about EF Core supporting any of those scenarios that are scoped out from 2.1 to create individual issues for them, up-vote them, and keep the discussion there.