Returning `SqlFragmentExpression` from `HasTranslation` doesn't work (switches to client evaluation)

dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.

https://docs.microsoft.com/ef/

MIT License

13.68k stars 3.17k forks source link

Returning `SqlFragmentExpression` from `HasTranslation` doesn't work (switches to client evaluation) #32969

Open InspiringCode opened 8 months ago

InspiringCode commented 8 months ago

I have the following function mapping with HasTranslation:

public static partial class EntityFrameworkExtensions
{
    public static int TotalRowCount()
        => throw new NotSupportedException();

    public static void HasTotalRowCountFunction(this ModelBuilder mb)
    {
        mb
            .HasDbFunction(() => TotalRowCount())
            .HasTranslation(arguments =>
            {
                return new SqlFragmentExpression("COUNT(*) OVER()");
            });
    }
}

But when I use the TotalRowCount function in a projection, the SqlFragmentExpression is ignored and the TotalRowCount function is evaluated client-side (the NotSupportedException is thrown). When I return a SqlFunctionExpression instead, the correct SQL is generated and the SQL function is correctly called.

I am observing a similar behavior when using an `IMethodCallTranslator´ to do the same translation. But in this case another strange behavior can be observed: If my function has a parameter, the translator is called, if it has no parameters (provided by LINQ query), like the method above, the translator isn't even called.

I did look into the EF source code quite a bit but I am feeling kind of lost. Could you please give me a tip why this doesn't work, how I could possibly achieve this simple mapping or at least some hints where I could look in the source code.

I really appreciate any help. Thank you.

InspiringCode commented 8 months ago

I have found a way to achieve the desired result but it is VERY hacky:

public static void HasTotalRowCountFunction(this ModelBuilder mb)
{
    mb
        .HasDbFunction(() => TotalRowCount())
        .HasTranslation(arguments =>
        {
            return new SqlFunctionExpression(
                "COUNT",
                new SqlExpression[] { new SqlFragmentExpression("*) OVER(") },
                false,
                new[] { false },
                typeof(int),
                null);
        });
}

But I am afraid that evil ghosts will haunt me in my nights dreams for the rest of my life if I put this into production... So I would really prefer a cleaner solution for this.

roji commented 8 months ago

Any chance you can post a minimal console program that shows this happening?

alienwareone commented 8 months ago

Was trying to do the same by returing an SqlFragmentExpression directly but it will throw an InvalidOperationException: _'The LINQ expression '[expression]' could not be translated. Either rewrite the query in a form that can be translated, or switch to client evaluation explicitly by inserting a call to 'AsEnumerable', 'AsAsyncEnumerable', 'ToList', or 'ToListAsync'.'_

Returning a derived SqlExpression which then returns the SqlFragmentExpression works, but unfortunately not with subqueries.

Reproduction code that tries to workaround the NULLS LAST issue in PostgreSQL in a very hacky way 😉

#define PG_DESCENDING_WITH_NULLS_LAST_EXPRESSION

using Microsoft.EntityFrameworkCore;
using Microsoft.EntityFrameworkCore.Query;
using Microsoft.EntityFrameworkCore.Query.SqlExpressions;
using Microsoft.EntityFrameworkCore.Storage;
using Microsoft.Extensions.Logging;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;

await using var dbContext = new ApplicationDbContext();
await dbContext.Database.EnsureDeletedAsync();
await dbContext.Database.EnsureCreatedAsync();

var sql1 = dbContext.Products
    .OrderBy(x => x.Discount.DescendingWithNullsLast())
    .ToQueryString();

Console.WriteLine(sql1);
// 👍
// SELECT p."Id", p."Discount", p."Name", p."Price"
// FROM "Products" AS p
// ORDER BY p."Discount" DESC NULLS LAST

var sql2 = dbContext.Products
    .Include(x => x.ProductAttributes)
    .OrderBy(x => x.Discount.DescendingWithNullsLast())
    .Skip(10)
    .Take(10)
    .ToQueryString();

Console.WriteLine(sql2);
// SELECT t."Id", t."Discount", t."Name", t."Price", p0."Id", p0."Name", p0."ProductId"
// FROM (                                                                     👇👇👇
//     SELECT p."Id", p."Discount", p."Name", p."Price", p."Discount" DESC NULLS LAST AS c
//     FROM "Products" AS p
//     ORDER BY p."Discount" DESC NULLS LAST
//     LIMIT @__p_0 OFFSET @__p_0
// ) AS t
// LEFT JOIN "ProductAttribute" AS p0 ON t."Id" = p0."ProductId"
// ORDER BY t.c, t."Id"

public sealed class ApplicationDbContext : DbContext
{
    public DbSet<Product> Products => Set<Product>();

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder) => optionsBuilder
        .UseNpgsql("Host=localhost;Username=postgres;Password=postgres;Database=EfCorePostgresqlNullsLast")
        .LogTo(Console.WriteLine, LogLevel.Debug)
        .EnableSensitiveDataLogging()
        .EnableDetailedErrors();

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
#if PG_DESCENDING_WITH_NULLS_LAST_EXPRESSION
        modelBuilder
            .HasDbFunction(typeof(OrderByExtensions).GetMethod(nameof(OrderByExtensions.DescendingWithNullsLast), [typeof(int?)])!)
            .HasTranslation(e =>
            {
                var column = (ColumnExpression)e[0];
                return new PgDescendingWithNullsLastExpression(column, e[0].Type, e[0].TypeMapping);
            });
#else
        // System.InvalidOperationException: The LINQ expression 'DbSet<Product>().OrderBy(p => p.Discount.DescendingWithNullsLast())' could not be translated. Either rewrite the query in a form that can be translated, or switch to client evaluation explicitly by inserting a call to 'AsEnumerable', 'AsAsyncEnumerable', 'ToList', or 'ToListAsync'. See https://go.microsoft.com/fwlink/?linkid=2101038 for more information.
        modelBuilder
            .HasDbFunction(typeof(OrderByExtensions).GetMethod(nameof(OrderByExtensions.DescendingWithNullsLast), [typeof(int?)])!)
            .HasTranslation(e =>
            {
                var column = (ColumnExpression)e[0];
                return new SqlFragmentExpression($"""{column.TableAlias}."{column.Name}" DESC NULLS LAST""");
            });
#endif
    }
}

public sealed class Product
{
    public int Id { get; private set; }
    public required string Name { get; set; }
    public required int Price { get; set; }
    public required int? Discount { get; set; }
    public ICollection<ProductAttribute> ProductAttributes { get; private set; } = [];
}

public sealed class ProductAttribute
{
    public int Id { get; private set; }
    public required string Name { get; set; }
}

public sealed class PgDescendingWithNullsLastExpression(
    ColumnExpression column,
    Type type,
    RelationalTypeMapping? typeMapping)
    : SqlExpression(type, typeMapping)
{
    protected override void Print(ExpressionPrinter expressionPrinter)
    {
    }

    protected override Expression VisitChildren(ExpressionVisitor visitor)
    {
        return new SqlFragmentExpression($"""{column.TableAlias}."{column.Name}" DESC NULLS LAST""");
    }
}

public static class OrderByExtensions
{
    // Only works inside OrderBy() and NOT OrderByDescending()
    public static int? DescendingWithNullsLast(this int? _)
    {
        throw new NotImplementedException();
    }
}

.csproj

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net8.0</TargetFramework>
    <LangVersion>preview</LangVersion>
    <Nullable>enable</Nullable>
    <WarningsAsErrors>nullable</WarningsAsErrors>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Npgsql.EntityFrameworkCore.PostgreSQL" Version="8.0.0" />
  </ItemGroup>

</Project>

InspiringCode commented 7 months ago

@roji Here is a .NET Fiddle that shows the failing version and the dirty workaround hack.

PS: I also tried the workaround shown in the post of @alienwareone (with a derived SqlExpression returning a SqlFragment in VisitChildren), but it also does not work in my case. See this fiddle.

roji commented 7 months ago

Problem 1: Can't use SqlFragment directly

The 1st problem here is that a SqlFragmentExpression currently cannot have a type mapping, and our query pipeline generally wants nodes in the tree to have one; when a lambda is translated, we apply the default type mapping on the result (if it doesn't already have one), and fail translation if that still has no type mapping.
The reason SqlFragmentExpression can't have a type mapping, is that it wasn't meant for use as an actual expression type, but rather just as something that's used to generate some SQL (like the star in COUNT(*)).
We can consider relaxing this and allowing SqlFragmentExpression to have a type mapping - this would unblock this usage.
An argument against this could be that it would lead users dow the wrong path of using SqlFragmentExpression when it shouldn't.
For example, using SqlFragmentExpression to express a COUNT(*) OVER () as is done above would make our query pipeline not recognize the node as an actual COUNT function call, in case there's such logic (not saying there is specifically).
But as this issue shows, there are in any case ways around the type mapping problem (i.e. having a SqlFunctionExpression that is later converted to SqlFragmentExpression). In other words, if users have decided to extend the query pipeline, they're already doing very advanced/risky stuff on their own, and blocking them via SqlFragmentExpression's type mapping isn't very useful.

Problem 2: Projection gets screwed up for subquery

In @alienwareone's example, the ordering needs to be projected out of the subquery, since it's preserved (the query really does return result in that order, as opposed to e.g. if the subquery where within Contains).
When projecting an ordering out of a subquery, EF simply takes the expression inside the ordering; it's usually a column (e.g. Discount), but in this case it has been replaced by a SqlFragmentExpression, which EF knows nothing about - so it just lifts it.
This is an example of the limits of SqlFragmentExpression; the NULLS LAST component really is part of the ordering itself (the wrapping OrderingExpression), which is also where the ASC/DESC part is; it isn't part of the thing inside the ordering, as is being hacked here.
So OTOH I don't see a solution involving the SqlFragmentExpression hack - this really needs to be dealt with via a PostgreSQL-specific PgOrderingExpression, which has NULLS FIRST as a feature (that's tracked by https://github.com/npgsql/efcore.pg/issues/627)

InspiringCode commented 7 months ago

But as this issue shows, there are in any case ways around the type mapping problem (i.e. having a SqlFunctionExpression that is later converted to SqlFragmentExpression). In other words, if users have decided to extend the query pipeline, they're already doing very advanced/risky stuff on their own, and blocking them via SqlFragmentExpression's type mapping isn't very useful.

@roji I too think that allowing SqlFragmentExpression would be the less confusing solution. My experience as API user was as follows:

I first learned that you can map CLR functions to database functions/procedures - that's great! I then found, that you can use HasTranslation for more advanced scenarios which really captured my attention.
After doing some research and looking at the examples in the official docs I concluded: ah, you have to return some kind of SqlExpression, so let's look what expressions there are and what might fit my purpose. Hmm, I just want to insert some static simple SQL. Hmm, SqlFragmentExpression seems to do exactly this, perfect, let's try it.
Hmm, strange, doesn't work. Ok, let's try something different and derive our own SqlExpression. Let's look what I would have to override. Ah, there is the Print method, that's exactly what I need, I just need to print a SQL fragment. So let's return my SQL from there.
Hmm, strange, that doesn't work either. Let's double check by returning a SqlFunctionExpression as in the docs. Hmm, that works. Ok, so let's copy the SqlFunctionExpression 1-to-1 and name it MySqlFunctionExpression. Ok, that doesn't work either. Conclusion: the actual SQL generation magic must happen somewhere else.
Ok, luckily we have all the source code of EF. Let's check out how this works. Let's do "Find all references" on SqlFragmentExpression and SqlFunctionExpression. Hmm, ok. It seems the actual statement is generated by the QuerySqlGenerator. Ok, now it's starting to get a bit complicated...
Ok, I surrender. Let's open an issue on GitHub and hope that some EF god has mercy to tell me my obvious mistake or misunderstanding.

I hope this short API experience story is of some help.

roji commented 7 months ago

@InspiringCode thanks for the added details - that's useful indeed. FWIW scenarios where users need to construct their own SQL trees for functions are quite rare/advanced, and ones where they need some sort of SQL construct which isn't already supported is even rarer - we hardly ever see people doing that. In general, integrating arbitrary "stuff" into the SQL tree is more complicated than it looks - as you can see from the above; it's simply not possible to just "make it work" in all scenarios etc. But allowing a type mapping specifically may be OK - I'll discuss with the team.

BTW I'd be interested in why you need to generate COUNT(*) OVER () in the first place.

InspiringCode commented 7 months ago

@roji Thank for taking the time to review the issue.

My use case is the following:

A lot of your business applications have the customer requirement that data lists are paginated, often in combination with somer filtering. They requirement is that the user sees the number of pages and can select a certain page.
I know that it is often recommended that a better way to do this is to just display a "more" button or use some virtual scrolling techniques or something along these lines (to avoid counting the total number of rows), but in our case this wasn't an option.
To implement this, I need the total number of rows of the filtered data and the data itself.
Some queries with their filters (which I have to also consider for the count) are quite expensive and I really don't want to issue the same (expensive) query twice - once just for the count and once for the data itself.
I solved this problem by simply including the total count in every returned row (which should be just a few) via a window function:

public static async Task<PagedResult<T>> ToPagedResultAsync<T>(
    this IQueryable<T> query,
    int page,
    int pageSize)
{
    var result = await query
        .Select(x => new
        {
            Item = x,
            TotalCount = TotalRowCount()
        })
        .Skip(page * pageSize)
        .Take(pageSize)
        .ToArrayAsync();

    return new PagedResult<T>(
        Items: result.Select(x => x.Item).ToArray(),
        Page: page,
        TotalCount: result.FirstOrDefault()?.TotalCount ?? 0
    );
}

This was a lot faster than issuing two separate query (the additional row count for complex queries was barely noticeable).

roji commented 7 months ago

Thanks for the added detail - yeah, hopefully one day EF will have support for window functions (#12747), at which point such hacks will no longer be needed. Though another possibly better way to handle this would be to allow batching LINQ queries (#10879) - the downside with the window function hack is that at least in theory, the database is recalculating the total rows for each row it returns (though this is hopefully optimized away under the hood).

InspiringCode commented 7 months ago

@roji Query batching doesn't solve the problem in this case because the round-trip is not the expensive thing. Running COUNT(*) over the actual query takes nearly as long as the actual query itself which is a few seconds in my case (times two).

I really hope EF will support window functions natively soon, because I think they are an essential tool in business applications.

roji commented 7 months ago

How is it possible that running COUNT() in a separate query would take more than `COUNT() OVER ()` which does the same?

InspiringCode commented 7 months ago

@roji When using COUNT(*) OVER() what SQL Server does is basically:

Run the complex and expensive query
Write all the results to a temp table (the Table Spool nodes in the query plan)
Count the rows of the temp table
Apply the paging to the temp table and return the rows of the current page

Here is a comparison of the actual execution plan (above: with COUNT(*) OVER(), below: plain query without any COUNT. Note that there are 3 Table Spool nodes but they are all the same node (same primary ID, just duplicated in the graphics):

count-over-query-plan-1

When I just run a normal COUNT(*) it takes basically as long as the actual query itself (around 6.2 seconds), which would give me around 12.5 seconds with batched query feature vs 7.5 seconds total with COUNT(*) OVER().

roji commented 7 months ago

Ah I see, the part I was missing was that the COUNT(*) OVER () was already over the result of a complex query, but before the paging; so that to get the same you'd have to redo the query (without paging) and do COUNT(*) over that. The repetition of the query is the problem here.

So thanks, that's a pretty convincing scenario for enabling some sort of support for window functions - I've added a note to https://github.com/dotnet/efcore/issues/12747#issuecomment-1933796780.

roji commented 7 months ago

Design decision: we think allowing SqlFragmentExpression to have a type mapping is reasonable; SqlFragmentExpression is a very advanced escape hatch, and it doesn't make much sense to specifically block that there (especially given that people are already working around that limitation as above).

Charlieface commented 6 months ago

@roji I too think that allowing SqlFragmentExpression would be the less confusing solution. My experience as API user was as follows:

I first learned that you can map CLR functions to database functions/procedures - that's great! I then found, that you can use HasTranslation for more advanced scenarios which really captured my attention.

After doing some research and looking at the examples in the official docs I concluded: ah, you have to return some kind of SqlExpression, so let's look what expressions there are and what might fit my purpose. Hmm, I just want to insert some static simple SQL. Hmm, SqlFragmentExpression seems to do exactly this, perfect, let's try it.

Hmm, strange, doesn't work. Ok, let's try something different and derive our own SqlExpression. Let's look what I would have to override. Ah, there is the Print method, that's exactly what I need, I just need to print a SQL fragment. So let's return my SQL from there.

Hmm, strange, that doesn't work either. Let's double check by returning a SqlFunctionExpression as in the docs. Hmm, that works. Ok, so let's copy the SqlFunctionExpression 1-to-1 and name it MySqlFunctionExpression. Ok, that doesn't work either. Conclusion: the actual SQL generation magic must happen somewhere else.

Ok, luckily we have all the source code of EF. Let's check out how this works. Let's do "Find all references" on SqlFragmentExpression and SqlFunctionExpression. Hmm, ok. It seems the actual statement is generated by the QuerySqlGenerator. Ok, now it's starting to get a bit complicated...

Ok, I surrender. Let's open an issue on GitHub and hope that some EF god has mercy to tell me my obvious mistake or misunderstanding.

I hope this short API experience story is of some help.

I've had the exact same experience with any expression not supported (in any minor way) by EF Core.

This is a perfect example of why the whole QuerySqlGenerator pipeline is completely broken from a design perspective. The only way to override any of it is to basically rebuild all the dependency hookups from scratch, and hope that noone else has done the same, and hope that the API doesn't change under you in the next release.

There should be exactly one place where the SQL is generated: in the Print function on each SqlExpression. This should mean that I can create arbitrary SqlExpression classes and override Print Update Equals and GetHashCode. the use a custom HasTranslation, and I should be good to go, without touching anything else in the pipeline.

Basically what I'm saying is: QuerySqlGenerator is a monolithic blob of code that is not easily extensible, and needs to be refactored,

roji commented 6 months ago

@Charlieface I'd advise getting to know the innards of EF and what it means to generate SQL in-depth before making such statements.

The problem with adding a new SQL expression is not QuerySqlGenerator at all; that's just a standard expression visitor that's trivial to extend, adding support for arbitrary new expression types. The problem is that SQL expression aren't just inserted into the tree and then printed - there's various intermediate processing around them.

As an example, EF performs "type mapping inference"; that means that when you write Where(b => b.Name.Substring(5) == name), the actual type mapping of name (is it an nvarchar(max)? varchar(x)) gets inferred from the left side of the comparison; on that left side, the type mapping bubbles up from b.Name through the Substring method call, to then be applied to the right side. In other word, when translating the Substring() method call, we need logic that knows where to bubble up the type mapping from. Any new expression type would need to have the same handling in order to work correctly. You can take a look at SqlExpressionFactory to learn more about this.

Another example is nullability processing; EF performs various compensation to make the SQL behave in the same way around NULLs as the original C# code does (although C# has 2-value logic and SQL has 3-value logic). This requires EF to know about each different expression type, and process it correctly; take a look at SqlNullabilityProcessor to know more.

In other words, the problem isn't QuerySqlGenerator, it's the fact that full support for a new expression type needs to be implemented in various stages of the EF query pipeline; all of these are extensible (at least to some extent), but the task isn't easy and requires in-depth knowledge of EF internals.

That's just the reality of a complex, feature-rich ORM such as EF - it's easy to think you want to just add a SQL expression, but that's just not how it works. We've also seen very few people actually try to add expression types, and the right way to deal with that is to implement the feature that those users are looking for (e.g. window functions), rather than for users to try to insert support for custom SQL expressions into the ORM.

Coder3333 commented 4 months ago

All I want to do is be able to reference the Sql Server CURRENT_USER command. If there were a built in way to do this, I would take it, but since there does not appear to be one, I tried to implement it as a SqlFragmentExpression. On paper it made so much sense, but of course it doesn't actually work. I will now try to implement it as a function, but it would have been so nice to not have to go through all of these extra steps.

What happened to https://github.com/dotnet/efcore/pull/33053? It looks like it was completed months ago, but still appears to not be in production.

roji commented 4 months ago

@Coder3333 we're indeed missing easy mapping of user-defined niladic functions (i.e. functions which have zero arguments and don't have parentheses). However, SqlFragmentExpression isn't needed - you can simply use HasTranslation to map directly to SqlFunctionExpression - see #33817 for the code.