dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.79k stars 3.19k forks source link

Query: Support GroupJoin when it is final query operator #19930

Closed smitpatel closed 2 years ago

smitpatel commented 4 years ago

This issue is about enabling following query.

context.Customers.GroupJoin(context.Orders, c => c.CustomerID, o => o.CustomerID, (c, os) => new {c, os}).ToList();
// or
from c in context.Customers
join o in context.Orders on c.CustomerID equals o.CustomerID into os
select { c, os}

We will not client eval GroupJoin in any of following cases

All above unsupported cases will still throw client evaluation exception even after this issue is fixed.

GroupJoin operator in query which is used to generated LeftJoin in SQL as described in our documentation is supported and will remains so. Even after that use case, if you still need EF Core to client eval GroupJoin operator with aforementioned constraints please upvote this issue. We are not looking to client eval any other GroupJoin which appears in the middle of the query.

powermetal63 commented 4 years ago

Don't waste your time with this useless enhancement. This definitely is NOT what I wanted. Client eval, seriously? If it wasn't clear after all these writings in the now closed issue, I want FULL SERVER support for this LINQ operator. And btw, it has nothing in common with GroupBy. It's a correlation operator similar to Join, but producing correlated set rather than flattened set. Putting it into the same box as GroupBy is a BIG mistake. IMHO. This my last comment here, luckily I'm not using EF Core in production, hence I don't really care, except for the unlucky people which do use and I'm trying to help overcoming the unnecessary limitations put by EF Core team. We all know LINQ to Queryable is broken, but EF Core makes that a rule rather than an exception. The sad thing is that even though EF6 has no client eval concept, it translates and evaluates much more LINQ constructs than EF Core, so it's not surprising that some people actively propose EF6.4 or EF Classic as better EF Core alternatives. Which is sad because I like the simplicity of the EF Core exposed metadata, also the replaceable services architecture, but w/o proper query translation it is just not useful in real world applications. No more comments.

powermetal63 commented 4 years ago

Really last comment :-) (hopefully constructive)

Here is quick and dirty EXTERNAL implementation of what I wanted:

public static partial class QueryableExtensions
{
    public static IQueryable<T> ConvertGroupJoins<T>(this IQueryable<T> source)
    {
        var expression = new GroupJoinConverter().Visit(source.Expression);
        if (source.Expression == expression) return source;
        return source.Provider.CreateQuery<T>(expression);
    }

    class GroupJoinConverter : ExpressionVisitor
    {
        protected override Expression VisitMethodCall(MethodCallExpression node)
        {
            if (node.Method.DeclaringType == typeof(Queryable) && node.Method.Name == nameof(Queryable.GroupJoin) && node.Arguments.Count == 5)
            {
                var outer = Visit(node.Arguments[0]);
                var inner = Visit(node.Arguments[1]);
                var outerKeySelector = Visit(node.Arguments[2]).UnwrapLambdaFromQuote();
                var innerKeySelector = Visit(node.Arguments[3]).UnwrapLambdaFromQuote();
                var resultSelector = Visit(node.Arguments[4]).UnwrapLambdaFromQuote();
                var outerKey = outerKeySelector.Body.ReplaceParameter(outerKeySelector.Parameters[0], resultSelector.Parameters[0]);
                var innerKey = innerKeySelector.Body;
                var keyMatch = MatchKeys(outerKey, innerKey);
                var innerQuery = Expression.Call(
                    typeof(Queryable), nameof(Queryable.Where), new[] { innerKeySelector.Parameters[0].Type },
                    inner, Expression.Lambda(keyMatch, innerKeySelector.Parameters));
                var resultTypes = resultSelector.Parameters.Select(p => p.Type).ToArray();
                var tempProjectionType = typeof(Tuple<,>).MakeGenericType(resultTypes);
                var tempProjection = Expression.New(
                    tempProjectionType.GetConstructor(resultTypes),
                    new Expression[] { resultSelector.Parameters[0], innerQuery },
                    tempProjectionType.GetProperty("Item1"), tempProjectionType.GetProperty("Item2"));
                var tempQuery = Expression.Call(
                    typeof(Queryable), nameof(Queryable.Select), new[] { outerKeySelector.Parameters[0].Type, tempProjectionType },
                    outer, Expression.Lambda(tempProjection, resultSelector.Parameters[0]));
                var tempResult = Expression.Parameter(tempProjectionType, "p");
                var projection = resultSelector.Body
                    .ReplaceParameter(resultSelector.Parameters[0], Expression.Property(tempResult, "Item1"))
                    .ReplaceParameter(resultSelector.Parameters[1], Expression.Property(tempResult, "Item2"));
                var query = Expression.Call(
                    typeof(Queryable), nameof(Queryable.Select), new[] { tempProjectionType, projection.Type },
                    tempQuery, Expression.Lambda(projection, tempResult));
                return query;
            }
            return base.VisitMethodCall(node);
        }

        static Expression MatchKeys(Expression outerKey, Expression innerKey)
        {
            var newOuterKey = outerKey as NewExpression;
            if (newOuterKey == null)
                return Match(outerKey, innerKey);
            var newInnerKey = (NewExpression)innerKey;
            Expression keyMatch = null;
            for (int i = 0; i < newOuterKey.Arguments.Count; i++)
            {
                var match = Match(newOuterKey.Arguments[i], newInnerKey.Arguments[i]);
                keyMatch = keyMatch != null ? Expression.AndAlso(keyMatch, match) : match;
            }
            return keyMatch;
        }

        static Expression Match(Expression left, Expression right)
            => Expression.Equal(left, right);
    }
}

When applied, it allows translation/processing (in 3.1.1) of all GroupJoin scenarios.

The question is, if I can do that externally, why can't you do something similar internally (using all the power of your expression processing helpers/experience) :-/ Cheers.

NetMage commented 4 years ago

I think this is a good enhancement because without it, the (general) EF Core 3.x advice to manually switch to client-side evaluation does exactly what has been often commented as worth avoiding in automatic client side evaluation, i.e. pulling a lot of data from the server unnecessarily.

At least, in something like

var ans = context.Customers
                 .Where(c => c.CustomerID < 100)
                 .GroupJoin(context.Orders, c => c.CustomerID, o => o.CustomerID, (c, os) => new { c, os })
                 .ToList();

where could you put AsEnumerable that wouldn't involve pulling a lot of unnecessary Orders rows?

(I hope this case would be handled.)

It would be really good if this supports projection at least so that unneeded columns aren't pulled to the client as in:

var ans = context.Customers
                 .Where(c => c.CustomerID < 100)
                 .GroupJoin(context.Orders, c => c.CustomerID, o => o.CustomerID, (c, os) => new { c.CustomerID, OrderIDs = os.Select(o => o.OrderID) })
                 .ToList();

I am still not sure if translating to a single correlated SQL query versus two queries to minimize data transfer would be most efficient, but I suppose the answer would very depending on the database.

smitpatel commented 4 years ago
context.Customers
    .Where(c => c.CustomerID < 100)
    .Select(c => new {
        c = c,
        os = context.Orders.Where(o => c.CustomerID == o.CustomerID).ToList()
    })

is equivalent query. LINQ is magical.

powermetal63 commented 4 years ago

context.Customers .Where(c => c.CustomerID < 100) .Select(c => new { c = c, os = context.Orders.Where(o => c.CustomerID == o.CustomerID).ToList() }) is equivalent query. LINQ is magical.

Yes, it is. That's why the EF job is to convert/transform/treat the original as the manually written equivalent.

SQL also has different valid query constructs (WHERE IN (…), WHERE EXISTS (…),JOIN`) which can produce equivalent results, but they are not forcing people to use just one of them. Modern query optimizers recognizes them and create one an the same execution plan. And EF Core is intended to be a "modern" framework, isn't it?


P.S. Sorry for repeating/linking to it couple times, but see the ConvertGroupJoins transformation from https://github.com/dotnet/efcore/issues/19930#issuecomment-587476008 and then below applied to @NetMage example and your response:

var qA = context.Customers
    .Where(c => c.CustomerID < 100)
    .GroupJoin(context.Orders, c => c.CustomerID, o => o.CustomerID, (c, os) => new { c, os });

var qB = context.Customers
    .Where(c => c.CustomerID < 100)
    .Select(c => new {
        c = c,
        os = context.Orders.Where(o => c.CustomerID == o.CustomerID).ToList()
    });

var qC = qA.ConvertGroupJoins();
// qC is same as qB, should be done automatically by EF :)
roji commented 4 years ago

Yes, it is. That's why the EF job is to convert/transform/treat the original as the manually written equivalent.

While in theory that's true, given that the amount of work we have is huge, we prioritize working on translations (and features in general) which don't have easy/simple workaround such as this one. If it's possible to rewrite your LINQ query so that it works - and is even simpler as in the case above - isn't that reasonable?

powermetal63 commented 4 years ago

Yes, it is. That's why the EF job is to convert/transform/treat the original as the manually written equivalent.

While in theory that's true, given that the amount of work we have is huge, we prioritize working on translations (and features in general) which don't have easy/simple workaround such as this one. If it's possible to rewrite your LINQ query so that it works - and is even simpler as in the case above - isn't that reasonable?

@roji No, it isn't, because you want be to do something additional work which is not strongly necessary. Because my time is costly too.

Just out of curiosity, why you spent your valuable time writing the (IMHO more complicated) GroupJoinFlatteningExpressionVisitor to support the weird LINQ left join pattern? You could have asked the end users to rewrite their LINQ queries using the even simpler correlated subquery syntax, right?

Why don't you go further and remove the whole LINQ support and ask the end users to rewrite their queries using some "even simpler" but EF Core specific dialect for their queries?

Finally, it's a shame the "modern" framework translating less LINQ patterns than its "obsolete" predecessor. Just FYI :)

roji commented 4 years ago

Once again, where we can, we add translations based on the implementation cost, on whether a different construct exists to express the same thing, and importantly, based on user feedback. This issue has received almost no votes or feedback from users, which is a main reason why we haven't prioritized it.

NetMage commented 4 years ago

That is nice, could it not be used as a GroupJoin translation?

powermetal63 commented 4 years ago

@roji

This issue has received almost no votes or feedback from users,

Because the victims go to StackOverflow instead of here?

powermetal63 commented 4 years ago

@NetMage This is what I'm fighting for here. But they "see no value" :(

roji commented 4 years ago

This issue has received almost no votes or feedback from users,

Because the victims go to StackOverflow instead of here?

We do have other issues with hundreds of votes - this one does not. Note that @smitpatel - who is a team member - is the one who opened this issue in the first place, so it's not true that we see no value. But at the risk of repeating myself, this issue has very little user requests, and there is another LINQ construct that produces the same result (one could even say that it's simpler). Therefore, since we have an enormous list of things to do and our resources are limited, this issue is in the backlog (not closed).

NetMage commented 4 years ago

@powermetal63 Theoretically, if one cares enough, an enhancement could be done by someone from the community. Unfortunately having a usable EF Core is not my primary job, but I am working on a C# extension method using this translation in my spare time.

powermetal63 commented 4 years ago

@roji From my point of view this is continuation of #17068, which was closed by @smitpatel with the following conclusion

GroupJoin operator GroupJoin is mainly used to generate LeftJoin and there is no SQL equivalent. Hence we will not translate any composition over GroupJoin. Filed #19930 to client eval GroupJoin when it is final query operator and grouping is not being composed over. In all other cases, GroupJoin is not going to supported. GroupJoin <-> GroupBy equivalence Due to high cost of pattern matching GroupJoin operator, we are not going to add any translation in GroupJoin even if equivalent GroupBy query would be translated. Customers would be required use GroupBy syntax as that is closest representation of what is the SQL.

And here are some related issues recently closed as duplicate of it - #13887, #14490, #20660

What I'm trying to argue from the beginning is that putting GroupJoin into same box as GroupBy is wrong. And what I really want is reopening a separate issue for (or changing this to) Query: Support GroupJoin translation. With the following rationale:

As you can see from my sample code, doing that is quite easy even with external code. You already put significant effort in 3.x to support just one variation - LINQ left join pattern using SelectMany on GroupJoin.DefaultIfEmpty, and even additional effort for porting (refactoring) it into this huge TryFlattenGroupJoinSelectMany method.

But running my ConvertGroupJoins method not only solves the GroupJoin translation in general, but also totally eliminates the need of that TryFlattenGroupJoinSelectMany code (because there are no GroupJoins in the query expression tree at all!).

What I'm suggesting is replacing TryFlattenGroupJoinSelectMany with GroupJoin transformation similar to my sample code (adjusted with your internal coding standards) and voila - issue solved once forever. All that with much less code than currently.

sungam3r commented 3 years ago

@powermetal63 What is ReplaceParameter in your example?

powermetal63 commented 3 years ago

@sungam3r Nothing special, a typical expression visitor based helper for finding ParemeterExpression instances inside expression tree and replacing them with arbitrary Expression. Something like

public static partial class ExpressionUtils
{
    public static Expression ReplaceParameter(this Expression source, ParameterExpression parameter, Expression value)
        => new ParameterReplacer { Parameter = parameter, Value = value }.Visit(source);

    class ParameterReplacer : ExpressionVisitor
    {
        public ParameterExpression Parameter;
        public Expression Value;
        protected override Expression VisitParameter(ParameterExpression node)
            => node == Parameter ? Value : node;
    }
}
NetMage commented 3 years ago

@powermetal63 Can you provide a sample of the transformation your ConvertGroupJoins does on a query?

marklagendijk commented 3 years ago

This issue has made me realize that there is a serious pain point with EF Core in its current state (5.x). It is something I have run into quite a few times before, but I didn't see the patter up till now.

Although EF Core is mostly nice and allows you to do many things, the current state of its Linq API isn't nice:

  1. It allows you to write code in numerous ways that it doesn't support.
  2. It will not complain during compilation, but only at runtime.
  3. It throws errors in which it gives incorrect advise (like suggesting to do clientside evaluation, even though an alternative syntax is supported that does it serverside).
  4. Finding out about the limitations and alternative solutions is relatively hard.

Solution per point (not sure which of these are feasible):

  1. Support all / nearly all code that the API allows.
  2. Break during compilation / provide a build tool that finds issues.
  3. Don't suggest alternative solutions that may be terrible in certain cases.
  4. Clearly document limitations, including suggesting proper alternatives.
smitpatel commented 3 years ago
  1. EF Core cannot support all code that LINQ API allows since there is fundamental difference in LINQ vs SQL. They are just not the same.
  2. Error is thrown at compile time, albeit the query compile time rather than code compile time. In order to successfully evaluate query to server, EF needs the model being used with the query alongwith other services. Which precludes using of any analyzers. Currently don't have a way to have a tool which runs the code also when evaluating things other than integration tests. Also if there are good integration tests for all queries then the difference between compile/runtime wouldn't make a difference.
  3. Due to fundamental mismatch, there are thousands of queries which don't work and associated thousands of alternative. But most common case it is forcing client evaluation because user wanted to use some client function in their query. Also it preserves what EF Core 2.x and earlier did - assuming that is what user wanted. Forcing client evaluation being terrible is very subjective on dataset. We suggest it and inform user in detail about what is client eval and its implication and let them decide if they really want to use it. In certain cases where we know the error is due to particular reason and there is better alternative, we suggest that. e.g. For string.Equals method with StringComparison arg doesn't translate to server and we recommend user to not pass in StringComparison arg.
  4. As written in previous point, there are thousands of patterns of LINQ which doesn't work in SQL and there are thousands of alternatives for them. We can spend next 5 years documenting everything and not making any source code change in EF Core or we can just add more feature to EF Core. We believe the latter helps more to our customers. We are also open to providing alternatives to our user when raise issue in our issue tracker if alternative exists.

SQL doesn't have GroupJoin operation on server because it cannot represent IEnumerable which grouping generates. There is no alternative for it. So neither of above raised point helps apart from whatever is stated already in this issue.

marklagendijk commented 3 years ago

Sorry that none of my points were helpful. I was only trying to voice my experience as a user, including trying to suggest some ideas that might lead to improvements. You have to realize that knowledge about ORM library implementations is so specific, that nearly none of the users actually understand the inner workings (even though they may be experienced developers / architects)

If I read your comment correctly you are saying that LINQ makes for a bad / leaky abstraction of SQL:

Now I know that on some level every ORM will suffer from limitations of their abstraction, which is why most ORMs have an option to execute raw SQL.

This makes me wonder:

smitpatel commented 3 years ago

Can you share any other abstraction which is familiar, intuitive and doesn't require user to learn altogether new query language? There is one abstraction which upto an extent can be made work by introducing operations which are supported by database but that abstraction would also run into issue because EF Core is not tied to one specific database type. There is also variation on what works on one database compared to another. So we would always be in a boat where regardless of abstraction chosen, some query won't always work.

marklagendijk commented 3 years ago

I do have some experience with Spring Data JPA, but not enough to actually judge whether it is 'better' than EF Core (which even then would probably be quite subjective).

Now I believe the approach there is:

Just as an idea: it might be really insightful for the EF Core team to talk with teams that build stuff like this (Spring JPA / Hibernate etc). Looking 'over the hedge' at what people are doing in other ecosystems can lead to very cool new ideas.

smitpatel commented 3 years ago

There are other 3rd party libraries which integrate on top of EF Core and provide all those functionality. OData is one big consumer with EF Core which provides paging/sorting/filtering functionality in the way you mention above without users having to write whole query themselves. We work with them so the queries generated by them are translated correctly in EF Core. There also other extension on top of EF Core to provide more functionality from database side & also other 3rd party library which can be used to create queries.

DreamU1952 commented 2 years ago

OUCH! I have LINQ queries with 'into' clauses all over my Legacy code. I have just tried to upgrade that old Entity Framework v5.0 to Entity Framework core and hit this issue almost immediately. Stop removing features that used to work! I am tired of all the products being released with breaking changes and products that provide no upgrade path to get from a prior version to a new version. I am leaning towards abandoning Entity Framework as the upgrade hassle is beginning to outweigh the benefits. We just upgraded 30-year old legacy code and the SQL statements just ported straight into the new app! That's a big plus for just staying with good old SQL.