Open roji opened 7 months ago
A quick check on PostgreSQL, SQLite and MariaDB doesn't show the same issue occuring. That is, with the above code, one always gets either 0 or 2, never NULL.
There are differences between COALESCE and ISNULL, but +1 for adding at least limited use of ISNULL. I have a use case where I'm summing a correlated subquery and comparing that against a known value. It's a dynamic piece added via Where, but it's something like this:
var values = new [] { "A", "B" }; // Dynamic.
return (TargetEntity te) => te.EnumerableProperty
.Distinct()
.Select(v => values.Contains(v) ? 1 : values.Length + 1)
.Sum() != values.Length;
The result of SUM is treated as nullable by the generated SQL, so a second clause checking for null is added which hurts performance. I can add a coalesce sum ?? 0
, but the generated SQL uses COALESCE and takes several minutes, often hitting my timeout. Swapping for ISNULL runs in less than a minute.
Fortunately I don't expect this piece of functionality to have much if any demand/use, so I'm tolerating the timeouts for now, but I'd love it if this kept up with our other functions.
@Matthew-Ricks-USBE that very specific case should be covered by #32574 (draft PR #32575 is out); it's more a case of problematic translation of the Contains (in this very specific case) than a problem with COALESCE itself.
I don't think they're related. The Select and Sum from the sample code translate to:
SELECT SUM(
CASE
WHEN [t].[ENUMERATED_THING] IN ( N'A', N'B' ) THEN 1
ELSE 3
END
)
That piece of the translation makes sense to me, but maybe I'm missing something.
@Matthew-Ricks-USBE I'm assuming you're using SQL Server - you may want to give 8.0 a try, it generates very different SQL. When using that, PR #32575 is about actually detected if there are nulls inside the parameterized list, and generating better SQL based on that; think about it as if you were doing the same query but with a collection of ints (non-nullable), rather than strings (nullable). In other words, ideally EF wouldn't even add the second clause for checking null.
But I take your general point... Switching from COALESCE to ISNULL is definitely something we should look into.
I would like to try and do this transformation.
IIUC ISNULL
has a different behavior regarding the type of the result (it always uses the type of the first expression).
Besides this, is there any other thing I should be careful about?
It seems that COALESCE may be expanded to a CASE/WHEN, which causes the expression to get evaluated twice (link); ISNULL apparently doesn't have this behavior. This has a performance impact (i.e. when the expression is a heavy subquery), but can also lead to incorrect results when the expression is non-deterministic and can return different results each time it's evaluated:
~Also, check if the same thing happens on other databases, i.e. whether this should be a SQL Server-only change or something with a wider scope.~
SQL Server documentation detailing all the above, plus the differences between COALESCE and ISNULL. See also (this post) on that.
Thanks for @John0King for reporting this (see https://github.com/dotnet/efcore/issues/22243#issuecomment-1840198932).