dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.73k stars 3.17k forks source link

query: reuse complex projection in orderby etc rather than defining it again #16038

Open maumar opened 5 years ago

maumar commented 5 years ago

test:

Select_take_null_coalesce_operator

query:

ctx.Customers.Select(
                    c => new
                    {
                        c.CustomerID,
                        c.CompanyName,
                        Region = c.Region ?? "ZZ"
                    }).OrderBy(c => c.Region).Take(5)

expected sql:

@__p_0='5'

SELECT TOP(@__p_0) [c].[CustomerID], [c].[CompanyName], COALESCE([c].[Region], N'ZZ') AS [Region]
FROM [Customers] AS [c]
ORDER BY [Region]

actual sql:

@__p_0='5'

SELECT TOP(@__p_0) [c].[CustomerID], [c].[CompanyName], COALESCE([c].[Region], N'ZZ')
FROM [Customers] AS [c]
ORDER BY COALESCE([c].[Region], N'ZZ')
smitpatel commented 5 years ago

This could be done via EV running after translation. During translation phase, projections are not always added to the list. (When they are stored in projectionMapping)

smitpatel commented 4 years ago

Related #20291

ChrisJollyAU commented 9 months ago

Just a couple of questions on this

  1. The above example is basically on a SqlFunctionExpression. Should it just handle that or other complex expressions (e.g. ScalarSubqueryExpression or ExistsExpression)
  2. Is this aiming to reuse it solely if there is a projection the same as the order by expression. Or even if the expression is solely in the ORDER BY, should it be moved to the projection with an alias and the alias referenced in the ORDER BY

I do have a fully working expression visitor from another provider that I'm working on that does exactly this that I can fixup/optimize

roji commented 9 months ago

The above example is basically on a SqlFunctionExpression. Should it just handle that or other complex expressions (e.g. ScalarSubqueryExpression or ExistsExpression)

AFAICT there's nothing here that's specific to SqlFunctionExpression; any time that something in the projection is ordered by, the ORDER BY clause should just reference that projection via its alias.

Is this aiming to reuse it solely if there is a projection the same as the order by expression. Or even if the expression is solely in the ORDER BY, should it be moved to the projection with an alias and the alias referenced in the ORDER BY

If the expression is solely in the ORDER BY, then it shouldn't be in the projection, since we don't want to project that thing (and e.g. download the data to the client) - only order by it. So no, I think this is only about not duplicating the expression when it's referenced both by in the ORDER BY and in the projection.

Note that the same issue most likely exist with OFFSET/LIMIT, which are also evaluated after the projection.

I do have a fully working expression visitor from another provider that I'm working on that does exactly this that I can fixup/optimize

I've been working on various query architecture things recently, and I actually don't think that an additional visitation pass is the right way to fix this; ideally, the SelectExpressions actual projection is always populated (not only when it's e.g. pushed down), and this would all be done eagerly as part of translation, rather than in another pass afterwards...

roji commented 9 months ago

Note: this conceptually intersects with #32277, which is about deduplicating subqueries in the tree; the scope here is much narrower (between clauses within the same select only), and we wouldn't necessarily use the same approach for both.

ChrisJollyAU commented 9 months ago

If the expression is solely in the ORDER BY, then it shouldn't be in the projection, since we don't want to project that thing (and e.g. download the data to the client) - only order by it. So no, I think this is only about not duplicating the expression when it's referenced both by in the ORDER BY and in the projection.

Good point. I've had to go the other direction and put it into the projection all the time as my provider (https://github.com/CirrusRedOrg/EntityFrameworkCore.Jet / MS Access) did not support those complex expressions in the ORDER BY.

You mentioned that you don't think doing another visitation pass would be the right way to go. Obviously if you are wanting to do it differently that would require a decent number of changes to the query pipeline to support that. If needed to be done now, I take it doing it in the QueryTranslationPostProcessor is reasonable? Or would there be a better place to do it?

The test mentioned in the first post ultimately comes out as (note MS Access SQL syntax as that is what I have handy)

SELECT TOP 5 `t`.`CustomerID`, `t`.`CompanyName`, `t`.`Region`
FROM (
    SELECT `c`.`CustomerID`, `c`.`CompanyName`, IIF(`c`.`Region` IS NULL, 'ZZ', `c`.`Region`) AS `Region`
    FROM `Customers` AS `c`
) AS `t`
ORDER BY `t`.`Region`

I just picked this up now when I was extending the expression to be du duplicated to the SqlFunctionExpression but if you have a Limit or both Limit and Offset you will most likely need to have the query with the complex expression in a subquery. In my case the TOP 5 part was originally pushed down into the subquery which executed before any ordering.

Had a look at the other issues you mentioned.

18775 I'm pretty sure that's a duplicate issue. It's a different type of expression but you can still do something like orderingExpression.Equals(projectionExpression) and get true

32277 Yep, this does intersect. Actually, though you are going to have a problem with that sample case mentioned. I initially thought my post processor would de duplicate the whole clause

SELECT MAX([i5].[Timestamp])
    FROM [IterationValues] AS [i4]
    INNER JOIN [Iteration] AS [i5] ON [i4].[IterationId] = [i5].[Id]
    WHERE [i4].[IterationId] IN (
        SELECT [i3].[value]
        FROM OPENJSON(@__ids_0) WITH ([value] int '$') AS [i3]
    ) AND [i].[IterationId] = [i4].[IterationId]

but then I noticed in the OPENJSON section that the aliases it uses are all different (i6, i0, i3) so comparing expressions would always return false. You would need some other way of matching expressions.

roji commented 9 months ago

Obviously if you are wanting to do it differently that would require a decent number of changes to the query pipeline to support that.

Yeah, that's definitely true. I'm currently doing a big push to clean up/refactor/modernize the query pipeline, although this specific area (rethink how projections/potential projections work) is probably long ways away.

If needed to be done now, I take it doing it in the QueryTranslationPostProcessor is reasonable?

Yep - you can add visitors to the postprocessor for your provider and do whatever transformations make sense there. It's sometimes tricky to get ordering right - you probably want to do your stuff after the relational postprocessor has completed, and the query tree is "done".

The test mentioned in the first post ultimately comes out as (note MS Access SQL syntax as that is what I have handy)

Right, makes sense - you're basically pushing down the query into a subquery so that the complex expression ends up in the subquery's projection, and at that point you can reference it as a simple column from the outer query's ORDER BY. FWIW a bit part of the work I'm currently doing on the pipeline is to make it much easier to transform the query tree in arbitrary ways - so EF 9.0 should look substantially better in this respect.

but then I noticed in the OPENJSON section that the aliases it uses are all different (i6, i0, i3)

This depends on exactly which kind of duplication is in question... In some case, the expression tree that comes out of translation contains the same SelectExpression instance, referenced from two places in the tree; this is easy to catch in a post-processing pass and transform as necessary (though one needs to do it right away, before any other visitor makes a change inside and causes the SelectExpression to be duplicated twice).

Anyway, it's interesting to know you're working on this kind of select manipulations for Jet; keep in touch around this and don't hesitate to let us know about what you're doing (and where EF could help you do better)!

ChrisJollyAU commented 6 months ago

I had some time over the weekend so converted my expression visitor over to SQL Server. You can find it at https://github.com/ChrisJollyAU/efcore/tree/dedupe if you want to have a look (based on 9.0 preview 2). It does deduplicate a number of queries.

The only thing I have noticed, is that when pushing down to a subquery, in EF 9 some subqueries get an additional column added to it that EF 8 doesnt do.

e.g. in Double_order_by_binary_expression

SELECT [w0].[Binary]
FROM (
    SELECT [w].[Id] + 2 AS [Binary], [w].[Id]
    FROM [Weapons] AS [w]
) AS [w0]
ORDER BY [w0].[Binary]

there is the extra [w].[Id] added in the pushdown. While not major, it is potentially unneeded. This comes about because with that SelectExpression there is the private value _identifier that has that column in it and the pushdown picks that up and adds it to the subquery projection. In EF8 _identifier doesn't have any values in it.

I know you are still thinking of changing the query architecture so maybe this will help in some form even if you don't go with a postprocessor. Otherwise, if you don't get to change the architecture, with a bit of cleanup and such, it may be something to merge

roji commented 6 months ago

Thanks @ChrisJollyAU - at some point we'll definitely get around to working on such query improvements, unfortunately there's little chance it'll happen in 9.0.

Thinking about this a bit again... For a "proper" (non-postprocessing fix), we'd first need to get rid of the pending selector architecture (#32957): for the above query (Select+OrderBy), we currently move the Select forward ("pending"), and so the OrderBy is rewritten to contain the duplicated, complex expression. In other words, this original query:

ctx.Customers.Select(c => new { c.CustomerID, Region = c.Region ?? "ZZ" }).OrderBy(c => c.Region)...

... becomes the following in preprocessing:

ctx.Customers.OrderBy(c => c.Region ?? "ZZ").Select(c => new { c.CustomerID, Region = c.Region ?? "ZZ" })...

So the first step in fixing this properly would be removing that reordering (no small task). At that point, we should be able to modify RelationalSqlTranslatingEV so that when it translates the OrderBy lambda (c.Region), it produces a references to the projection rather than duplicating the entire projected expression. This would be a specific mode of RelationalSqlTranslatingEV when it's running after the projection (e.g. OrderBy, Skip, Take...) as opposed to before (Where, GroupBy - which are evaluated before the projection and cannot refer to it).

ChrisJollyAU commented 3 months ago

Was working on updating this to preview 6 and have run into an issue based on some of the changes in EFCore 9 - namely the Alias Manager.

Basically when I am doing the post processing, I have the following steps to do on the select expression

The issue is that at this point in time the expression doesn't always have the internal variable _sqlAliasManager set so it is impossible to do the pushdown.

There are 2 cases here:

  1. The select expression already comes to the postprocessor without an alias manager
  2. It comes with an alias manager, but when we do the selectExpression.Update the new expression doesn't get the alias manager passed on

Is this how its meant to be that an Update doesn't pass the alias manager on?

roji commented 2 months ago

@ChrisJollyAU in the current design, SelectExpression has a mutable mode (when translation is still in progress and the expression is being mutated), and an immutable mode; the Pushdown() is currently only supported in mutable mode. However, you should be able to simply construct a new SelectExpression wrapping the existing one yourself - this should be easy (and the more correct way) when we're in post-processing, where the tree should be immutable.

I just happen to have submitted a PR doing this, see this code. Let me know if this doesn't work for you for some reason.