Open InspiringCode opened 8 months ago
I have found a way to achieve the desired result but it is VERY hacky:
public static void HasTotalRowCountFunction(this ModelBuilder mb)
{
mb
.HasDbFunction(() => TotalRowCount())
.HasTranslation(arguments =>
{
return new SqlFunctionExpression(
"COUNT",
new SqlExpression[] { new SqlFragmentExpression("*) OVER(") },
false,
new[] { false },
typeof(int),
null);
});
}
But I am afraid that evil ghosts will haunt me in my nights dreams for the rest of my life if I put this into production... So I would really prefer a cleaner solution for this.
Any chance you can post a minimal console program that shows this happening?
Was trying to do the same by returing an SqlFragmentExpression directly but it will throw an InvalidOperationException: _'The LINQ expression '[expression]' could not be translated. Either rewrite the query in a form that can be translated, or switch to client evaluation explicitly by inserting a call to 'AsEnumerable', 'AsAsyncEnumerable', 'ToList', or 'ToListAsync'.'_
Returning a derived SqlExpression which then returns the SqlFragmentExpression works, but unfortunately not with subqueries.
Reproduction code that tries to workaround the NULLS LAST issue in PostgreSQL in a very hacky way 😉
#define PG_DESCENDING_WITH_NULLS_LAST_EXPRESSION
using Microsoft.EntityFrameworkCore;
using Microsoft.EntityFrameworkCore.Query;
using Microsoft.EntityFrameworkCore.Query.SqlExpressions;
using Microsoft.EntityFrameworkCore.Storage;
using Microsoft.Extensions.Logging;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Linq.Expressions;
await using var dbContext = new ApplicationDbContext();
await dbContext.Database.EnsureDeletedAsync();
await dbContext.Database.EnsureCreatedAsync();
var sql1 = dbContext.Products
.OrderBy(x => x.Discount.DescendingWithNullsLast())
.ToQueryString();
Console.WriteLine(sql1);
// 👍
// SELECT p."Id", p."Discount", p."Name", p."Price"
// FROM "Products" AS p
// ORDER BY p."Discount" DESC NULLS LAST
var sql2 = dbContext.Products
.Include(x => x.ProductAttributes)
.OrderBy(x => x.Discount.DescendingWithNullsLast())
.Skip(10)
.Take(10)
.ToQueryString();
Console.WriteLine(sql2);
// SELECT t."Id", t."Discount", t."Name", t."Price", p0."Id", p0."Name", p0."ProductId"
// FROM ( 👇👇👇
// SELECT p."Id", p."Discount", p."Name", p."Price", p."Discount" DESC NULLS LAST AS c
// FROM "Products" AS p
// ORDER BY p."Discount" DESC NULLS LAST
// LIMIT @__p_0 OFFSET @__p_0
// ) AS t
// LEFT JOIN "ProductAttribute" AS p0 ON t."Id" = p0."ProductId"
// ORDER BY t.c, t."Id"
public sealed class ApplicationDbContext : DbContext
{
public DbSet<Product> Products => Set<Product>();
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder) => optionsBuilder
.UseNpgsql("Host=localhost;Username=postgres;Password=postgres;Database=EfCorePostgresqlNullsLast")
.LogTo(Console.WriteLine, LogLevel.Debug)
.EnableSensitiveDataLogging()
.EnableDetailedErrors();
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
#if PG_DESCENDING_WITH_NULLS_LAST_EXPRESSION
modelBuilder
.HasDbFunction(typeof(OrderByExtensions).GetMethod(nameof(OrderByExtensions.DescendingWithNullsLast), [typeof(int?)])!)
.HasTranslation(e =>
{
var column = (ColumnExpression)e[0];
return new PgDescendingWithNullsLastExpression(column, e[0].Type, e[0].TypeMapping);
});
#else
// System.InvalidOperationException: The LINQ expression 'DbSet<Product>().OrderBy(p => p.Discount.DescendingWithNullsLast())' could not be translated. Either rewrite the query in a form that can be translated, or switch to client evaluation explicitly by inserting a call to 'AsEnumerable', 'AsAsyncEnumerable', 'ToList', or 'ToListAsync'. See https://go.microsoft.com/fwlink/?linkid=2101038 for more information.
modelBuilder
.HasDbFunction(typeof(OrderByExtensions).GetMethod(nameof(OrderByExtensions.DescendingWithNullsLast), [typeof(int?)])!)
.HasTranslation(e =>
{
var column = (ColumnExpression)e[0];
return new SqlFragmentExpression($"""{column.TableAlias}."{column.Name}" DESC NULLS LAST""");
});
#endif
}
}
public sealed class Product
{
public int Id { get; private set; }
public required string Name { get; set; }
public required int Price { get; set; }
public required int? Discount { get; set; }
public ICollection<ProductAttribute> ProductAttributes { get; private set; } = [];
}
public sealed class ProductAttribute
{
public int Id { get; private set; }
public required string Name { get; set; }
}
public sealed class PgDescendingWithNullsLastExpression(
ColumnExpression column,
Type type,
RelationalTypeMapping? typeMapping)
: SqlExpression(type, typeMapping)
{
protected override void Print(ExpressionPrinter expressionPrinter)
{
}
protected override Expression VisitChildren(ExpressionVisitor visitor)
{
return new SqlFragmentExpression($"""{column.TableAlias}."{column.Name}" DESC NULLS LAST""");
}
}
public static class OrderByExtensions
{
// Only works inside OrderBy() and NOT OrderByDescending()
public static int? DescendingWithNullsLast(this int? _)
{
throw new NotImplementedException();
}
}
.csproj
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>net8.0</TargetFramework>
<LangVersion>preview</LangVersion>
<Nullable>enable</Nullable>
<WarningsAsErrors>nullable</WarningsAsErrors>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="Npgsql.EntityFrameworkCore.PostgreSQL" Version="8.0.0" />
</ItemGroup>
</Project>
@roji Here is a .NET Fiddle that shows the failing version and the dirty workaround hack.
PS: I also tried the workaround shown in the post of @alienwareone (with a derived SqlExpression
returning a SqlFragment
in VisitChildren
), but it also does not work in my case. See this fiddle.
COUNT(*)
).COUNT(*) OVER ()
as is done above would make our query pipeline not recognize the node as an actual COUNT
function call, in case there's such logic (not saying there is specifically).Discount
), but in this case it has been replaced by a SqlFragmentExpression, which EF knows nothing about - so it just lifts it.NULLS LAST
component really is part of the ordering itself (the wrapping OrderingExpression), which is also where the ASC/DESC part is; it isn't part of the thing inside the ordering, as is being hacked here.But as this issue shows, there are in any case ways around the type mapping problem (i.e. having a SqlFunctionExpression that is later converted to SqlFragmentExpression). In other words, if users have decided to extend the query pipeline, they're already doing very advanced/risky stuff on their own, and blocking them via SqlFragmentExpression's type mapping isn't very useful.
@roji I too think that allowing SqlFragmentExpression
would be the less confusing solution. My experience as API user was as follows:
HasTranslation
for more advanced scenarios which really captured my attention. SqlExpression
, so let's look what expressions there are and what might fit my purpose. Hmm, I just want to insert some static simple SQL. Hmm, SqlFragmentExpression
seems to do exactly this, perfect, let's try it.SqlExpression
. Let's look what I would have to override. Ah, there is the Print
method, that's exactly what I need, I just need to print a SQL fragment. So let's return my SQL from there.SqlFunctionExpression
as in the docs. Hmm, that works. Ok, so let's copy the SqlFunctionExpression
1-to-1 and name it MySqlFunctionExpression
. Ok, that doesn't work either. Conclusion: the actual SQL generation magic must happen somewhere else.SqlFragmentExpression
and SqlFunctionExpression
. Hmm, ok. It seems the actual statement is generated by the QuerySqlGenerator
. Ok, now it's starting to get a bit complicated...I hope this short API experience story is of some help.
@InspiringCode thanks for the added details - that's useful indeed. FWIW scenarios where users need to construct their own SQL trees for functions are quite rare/advanced, and ones where they need some sort of SQL construct which isn't already supported is even rarer - we hardly ever see people doing that. In general, integrating arbitrary "stuff" into the SQL tree is more complicated than it looks - as you can see from the above; it's simply not possible to just "make it work" in all scenarios etc. But allowing a type mapping specifically may be OK - I'll discuss with the team.
BTW I'd be interested in why you need to generate COUNT(*) OVER ()
in the first place.
@roji Thank for taking the time to review the issue.
My use case is the following:
public static async Task<PagedResult<T>> ToPagedResultAsync<T>(
this IQueryable<T> query,
int page,
int pageSize)
{
var result = await query
.Select(x => new
{
Item = x,
TotalCount = TotalRowCount()
})
.Skip(page * pageSize)
.Take(pageSize)
.ToArrayAsync();
return new PagedResult<T>(
Items: result.Select(x => x.Item).ToArray(),
Page: page,
TotalCount: result.FirstOrDefault()?.TotalCount ?? 0
);
}
This was a lot faster than issuing two separate query (the additional row count for complex queries was barely noticeable).
Thanks for the added detail - yeah, hopefully one day EF will have support for window functions (#12747), at which point such hacks will no longer be needed. Though another possibly better way to handle this would be to allow batching LINQ queries (#10879) - the downside with the window function hack is that at least in theory, the database is recalculating the total rows for each row it returns (though this is hopefully optimized away under the hood).
@roji Query batching doesn't solve the problem in this case because the round-trip is not the expensive thing. Running COUNT(*)
over the actual query takes nearly as long as the actual query itself which is a few seconds in my case (times two).
I really hope EF will support window functions natively soon, because I think they are an essential tool in business applications.
How is it possible that running COUNT() in a separate query would take more than `COUNT() OVER ()` which does the same?
@roji When using COUNT(*) OVER()
what SQL Server does is basically:
Table Spool
nodes in the query plan)Here is a comparison of the actual execution plan (above: with COUNT(*) OVER()
, below: plain query without any COUNT
. Note that there are 3 Table Spool
nodes but they are all the same node (same primary ID, just duplicated in the graphics):
When I just run a normal COUNT(*)
it takes basically as long as the actual query itself (around 6.2 seconds), which would give me around 12.5 seconds with batched query feature vs 7.5 seconds total with COUNT(*) OVER()
.
Ah I see, the part I was missing was that the COUNT(*) OVER ()
was already over the result of a complex query, but before the paging; so that to get the same you'd have to redo the query (without paging) and do COUNT(*)
over that. The repetition of the query is the problem here.
So thanks, that's a pretty convincing scenario for enabling some sort of support for window functions - I've added a note to https://github.com/dotnet/efcore/issues/12747#issuecomment-1933796780.
Design decision: we think allowing SqlFragmentExpression to have a type mapping is reasonable; SqlFragmentExpression is a very advanced escape hatch, and it doesn't make much sense to specifically block that there (especially given that people are already working around that limitation as above).
@roji I too think that allowing
SqlFragmentExpression
would be the less confusing solution. My experience as API user was as follows:
I first learned that you can map CLR functions to database functions/procedures - that's great! I then found, that you can use
HasTranslation
for more advanced scenarios which really captured my attention.After doing some research and looking at the examples in the official docs I concluded: ah, you have to return some kind of
SqlExpression
, so let's look what expressions there are and what might fit my purpose. Hmm, I just want to insert some static simple SQL. Hmm,SqlFragmentExpression
seems to do exactly this, perfect, let's try it.Hmm, strange, doesn't work. Ok, let's try something different and derive our own
SqlExpression
. Let's look what I would have to override. Ah, there is theHmm, strange, that doesn't work either. Let's double check by returning a
SqlFunctionExpression
as in the docs. Hmm, that works. Ok, so let's copy theSqlFunctionExpression
1-to-1 and name itMySqlFunctionExpression
. Ok, that doesn't work either. Conclusion: the actual SQL generation magic must happen somewhere else.Ok, luckily we have all the source code of EF. Let's check out how this works. Let's do "Find all references" on
SqlFragmentExpression
andSqlFunctionExpression
. Hmm, ok. It seems the actual statement is generated by theQuerySqlGenerator
. Ok, now it's starting to get a bit complicated...Ok, I surrender. Let's open an issue on GitHub and hope that some EF god has mercy to tell me my obvious mistake or misunderstanding.
I hope this short API experience story is of some help.
I've had the exact same experience with any expression not supported (in any minor way) by EF Core.
This is a perfect example of why the whole QuerySqlGenerator
pipeline is completely broken from a design perspective. The only way to override any of it is to basically rebuild all the dependency hookups from scratch, and hope that noone else has done the same, and hope that the API doesn't change under you in the next release.
There should be exactly one place where the SQL is generated: in the Print
function on each SqlExpression
. This should mean that I can create arbitrary SqlExpression
classes and override Print
Update
Equals
and GetHashCode
. the use a custom HasTranslation
, and I should be good to go, without touching anything else in the pipeline.
Basically what I'm saying is: QuerySqlGenerator
is a monolithic blob of code that is not easily extensible, and needs to be refactored,
@Charlieface I'd advise getting to know the innards of EF and what it means to generate SQL in-depth before making such statements.
The problem with adding a new SQL expression is not QuerySqlGenerator at all; that's just a standard expression visitor that's trivial to extend, adding support for arbitrary new expression types. The problem is that SQL expression aren't just inserted into the tree and then printed - there's various intermediate processing around them.
As an example, EF performs "type mapping inference"; that means that when you write Where(b => b.Name.Substring(5) == name)
, the actual type mapping of name
(is it an nvarchar(max)
? varchar(x)
) gets inferred from the left side of the comparison; on that left side, the type mapping bubbles up from b.Name through the Substring method call, to then be applied to the right side. In other word, when translating the Substring() method call, we need logic that knows where to bubble up the type mapping from. Any new expression type would need to have the same handling in order to work correctly. You can take a look at SqlExpressionFactory to learn more about this.
Another example is nullability processing; EF performs various compensation to make the SQL behave in the same way around NULLs as the original C# code does (although C# has 2-value logic and SQL has 3-value logic). This requires EF to know about each different expression type, and process it correctly; take a look at SqlNullabilityProcessor to know more.
In other words, the problem isn't QuerySqlGenerator, it's the fact that full support for a new expression type needs to be implemented in various stages of the EF query pipeline; all of these are extensible (at least to some extent), but the task isn't easy and requires in-depth knowledge of EF internals.
That's just the reality of a complex, feature-rich ORM such as EF - it's easy to think you want to just add a SQL expression, but that's just not how it works. We've also seen very few people actually try to add expression types, and the right way to deal with that is to implement the feature that those users are looking for (e.g. window functions), rather than for users to try to insert support for custom SQL expressions into the ORM.
All I want to do is be able to reference the Sql Server CURRENT_USER command. If there were a built in way to do this, I would take it, but since there does not appear to be one, I tried to implement it as a SqlFragmentExpression. On paper it made so much sense, but of course it doesn't actually work. I will now try to implement it as a function, but it would have been so nice to not have to go through all of these extra steps.
What happened to https://github.com/dotnet/efcore/pull/33053? It looks like it was completed months ago, but still appears to not be in production.
@Coder3333 we're indeed missing easy mapping of user-defined niladic functions (i.e. functions which have zero arguments and don't have parentheses). However, SqlFragmentExpression isn't needed - you can simply use HasTranslation to map directly to SqlFunctionExpression - see #33817 for the code.
I have the following function mapping with
HasTranslation
:But when I use the
TotalRowCount
function in a projection, theSqlFragmentExpression
is ignored and theTotalRowCount
function is evaluated client-side (theNotSupportedException
is thrown). When I return aSqlFunctionExpression
instead, the correct SQL is generated and the SQL function is correctly called.I am observing a similar behavior when using an `IMethodCallTranslator´ to do the same translation. But in this case another strange behavior can be observed: If my function has a parameter, the translator is called, if it has no parameters (provided by LINQ query), like the method above, the translator isn't even called.
I did look into the EF source code quite a bit but I am feeling kind of lost. Could you please give me a tip why this doesn't work, how I could possibly achieve this simple mapping or at least some hints where I could look in the source code.
I really appreciate any help. Thank you.