dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.5k stars 3.13k forks source link

SQL Server: consider switching from COALESCE to ISNULL #32519

Open roji opened 7 months ago

roji commented 7 months ago

It seems that COALESCE may be expanded to a CASE/WHEN, which causes the expression to get evaluated twice (link); ISNULL apparently doesn't have this behavior. This has a performance impact (i.e. when the expression is a heavy subquery), but can also lead to incorrect results when the expression is non-deterministic and can return different results each time it's evaluated:

CREATE TABLE Foo (A int, B int);
INSERT INTO Foo (A, B) VALUES (1, NULL), (2, 2), (3, NULL);

SELECT COALESCE((SELECT TOP 1 B FROM Foo ORDER BY NEWID() ASC), 0); -- sometimes returns NULL!

~Also, check if the same thing happens on other databases, i.e. whether this should be a SQL Server-only change or something with a wider scope.~

SQL Server documentation detailing all the above, plus the differences between COALESCE and ISNULL. See also (this post) on that.

Thanks for @John0King for reporting this (see https://github.com/dotnet/efcore/issues/22243#issuecomment-1840198932).

roji commented 7 months ago

A quick check on PostgreSQL, SQLite and MariaDB doesn't show the same issue occuring. That is, with the above code, one always gets either 0 or 2, never NULL.

Matthew-Ricks-USBE commented 6 months ago

There are differences between COALESCE and ISNULL, but +1 for adding at least limited use of ISNULL. I have a use case where I'm summing a correlated subquery and comparing that against a known value. It's a dynamic piece added via Where, but it's something like this:

var values = new [] { "A", "B" }; // Dynamic.
return (TargetEntity te) => te.EnumerableProperty
    .Distinct()
    .Select(v => values.Contains(v) ? 1 : values.Length + 1)
    .Sum() != values.Length;

The result of SUM is treated as nullable by the generated SQL, so a second clause checking for null is added which hurts performance. I can add a coalesce sum ?? 0, but the generated SQL uses COALESCE and takes several minutes, often hitting my timeout. Swapping for ISNULL runs in less than a minute.

Fortunately I don't expect this piece of functionality to have much if any demand/use, so I'm tolerating the timeouts for now, but I'd love it if this kept up with our other functions.

roji commented 6 months ago

@Matthew-Ricks-USBE that very specific case should be covered by #32574 (draft PR #32575 is out); it's more a case of problematic translation of the Contains (in this very specific case) than a problem with COALESCE itself.

Matthew-Ricks-USBE commented 6 months ago

I don't think they're related. The Select and Sum from the sample code translate to:

SELECT SUM(
    CASE
        WHEN [t].[ENUMERATED_THING] IN ( N'A', N'B' ) THEN 1
        ELSE 3
    END
)

That piece of the translation makes sense to me, but maybe I'm missing something.

roji commented 6 months ago

@Matthew-Ricks-USBE I'm assuming you're using SQL Server - you may want to give 8.0 a try, it generates very different SQL. When using that, PR #32575 is about actually detected if there are nulls inside the parameterized list, and generating better SQL based on that; think about it as if you were doing the same query but with a collection of ints (non-nullable), rather than strings (nullable). In other words, ideally EF wouldn't even add the second clause for checking null.

But I take your general point... Switching from COALESCE to ISNULL is definitely something we should look into.

ranma42 commented 4 days ago

I would like to try and do this transformation. IIUC ISNULL has a different behavior regarding the type of the result (it always uses the type of the first expression). Besides this, is there any other thing I should be careful about?