Open ranma42 opened 1 month ago
I think it might be a good idea to recognize this risky case and warn users about it just like EFCore already does when slicing (Take/First/...) unsorted queryables.
IIUC unsafe cases are {Avg,Max,Min}<{int,long,float,double,decimal}>
and the corresponding methods with the selector function (i.e. those that throw InvalidOperationException
when the source contains no elements). It might be nice to also recognize the .DefaultIfEmpty(n).{Avg,Max,Min}()
pattern and avoid warning in that case.
Safe cases would be the other overloads. Recommended options to fix the warning could be:
.Max()
with .Max(z => z as int?
in the previous code keeps the queries nice & clean and makes the behavior consistent between the two queries).DefaultIfEmpty(n)
pattern is recognized) using .DefaultIfEmpty(n).{Avg,Max,Min}()
Side note: the DefaultIfEmpty
translation currently only supports the no-argument version, see https://github.com/dotnet/efcore/issues/17783
Similar to #30915 or #33752.
Similar to #30915 or #33752.
The root cause is the same: nullability (specifically mismatch between C# and SQL nullability).
OTOH this case looks quite different to me: the exception is the expected behavior (at least assuming the same behavior as IEnumerable.Max()
) and the issue is that it "disappears" when selecting over it (usually you have to handle an exception in order to continue the computation; an expression does not usually compute if it contains a subexpression that throws).
Yes, I'm not saying this is a duplicate. It is just when/if we're around this code changing it, it's good to look also at the related issues and maybe do all the work together.
I'm not a fan of a warning here because it produces false negative (and we've learned that users absolutely hate those ;)) for a case of collection with data - and we can't tell those cases apart.
Compelling users to disable a warning (or to ignore it), or to change a perfectly valid and working query doesn't feel great.
The exception we throw in the second case (without composition) is not intentional - when Linq to Objects throws, we don't always try to mimic that (which is often very hard to do), but rather treat it as undefined behavior and do a best-effort approach to produce something that "makes sense" (usually false or null). Problem is that this often breaks (is inconsistent) for more complicated cases with composition etc. - just like here.
@maumar the warning would trigger even if it did not need to, so I believe it could have false positives; in a way, the current state is "false negative": the query is accepted even if it is troublesome. I guess users hate both false negatives and false positives... misreporting issues is always inconvenient (I tend to prefer false positives, especially if they are easy to silence).
I was suggesting the warning following the model of the missing OrderBy
(which also can report false positives).
Maybe the best solution for this would be an analyzer that be enabled/disabled by the user.
The exception we throw in the second case (without composition) is not intentional - when Linq to Objects throws, we don't always try to mimic that (which is often very hard to do), but rather treat it as undefined behavior and do a best-effort approach to produce something that "makes sense" (usually false or null).
I believe this is particularly troublesome when combined with the fact that there is no definition of what the translation should look like (as one cannot simply say that "it should behave as Linq to Objects").
Would it make sense to collect a list of the cases in which this is known to happen, so that they are recognized as UB? This would avoid the warning issues (the docs would warn the user, but the compiler/runtime would not complain). Assuming no commitment to a specific result (just UB), tracking the known cases would still be useful as these assumptions can be learned by API consumers and devs and issues can be closed by referencing this as an intended behavior (and/or extending the documentation for new cases).
@maumar the warning would trigger even if it did not need to, so I believe it could have false positives; in a way, the current state is "false negative": the query is accepted even if it is troublesome. I guess users hate both false negatives and false positives... misreporting issues is always inconvenient (I tend to prefer false positives, especially if they are easy to silence).
I was suggesting the warning following the model of the missing
OrderBy
(which also can report false positives). Maybe the best solution for this would be an analyzer that be enabled/disabled by the user.
In my mental model OrderBy case is more justified because actual false positives are much rarer (First/Single when collection has exactly 1 element, Skip is only fine when you skip everything, or use Skip(0) which is a noop, similar for take - only works deterministically when it's a noop or Take(0))
For the most cases that users may consider false positives the query only works fine by accident, because database engine happened to not have scrambled the order of rows it returned, but by no means it was guaranteed.
In this case false positives are all cases where collection contains any data which seems like an overwhelming majority of all cases.
~Perhaps analyzer could be a good compromise here - @roji any thoughts, since you are a fan of analyzers? ;)~ actually, I like the doc angle the best - filed https://github.com/dotnet/EntityFramework.Docs/issues/4736
Haven't gone over everything above, but a correct translation may be possible here, no? i.e. rather than translating the C# conditional to a 2-leg CASE/WHEN, could we do a 3-leg one?
In PG:
SELECT CASE WHEN x > 123 THEN 't' WHEN x IS NULL THEN NULL ELSE 'f' END;
Assuming the above works, the main problem I see is the duplication of x, which means the subquery would be evaluated twice (not good). Later down the road, we may be able to lift the subquery out to a CTE, though...
Or am I missing something?
@roji this should work just like your translation and does not have the duplication issue
SELECT CASE x > 123
WHEN TRUE THEN 't'
WHEN FALSE THEN 'f'
END;
Note that conceptually it would actually represent something like
SELECT CASE COALESCE(x, RAISE('empty collection')) > 123
WHEN TRUE THEN 't'
WHEN FALSE THEN 'f'
END;
but obviously the RAISE()
function is not a thing. For the sake of the argument let's assume that it emits a value that always propagates (somewhat like NULL
but stronger than: any operations/functions on it, emit it again, even COALESCE
etc; there is no CATCH
;) ).
In this specific case NULL
could be used as a marker for instead of RAISE()
and the shaper/materializer could throw an InvalidOperationException
just like for the no-comparison case, because the result of the expression is expected to be non-nullable (I am not sure if it would actually do that, since string
is a reference type, but let's assume it would take into account the nullability).
Unfortunately it would not work in general:
db.Blogs
.Select(x => x.Posts.Select(y => y.Id).Max())
.Select(z => z > 123 ? null : "F")
.ToList();
should translate to
SELECT CASE COALESCE(x, RAISE('empty collection')) > 123
WHEN TRUE THEN NULL
WHEN FALSE THEN 'f'
END;
but if we assume that RAISE('empty collection')
is simply NULL
it becomes impossible to distinguish the empty-collection-NULL
from the true-case-NULL
.
OTOH
db.Blogs
.Select(x => x.Posts.Select(y => (int?) y.Id).Max()
.Select(z => z > 123 ? null : "F")
.ToList();
translates to
SELECT CASE
WHEN (SELECT MAX("p"."Id") ...) > 123 THEN NULL
ELSE 'F'
END
which already works as in C#:
Max()
is null
, just like the result of (SELECT MAX("p"."Id") ...)
is NULL
z > 123
is true iff z
is not null
and its value is > 123
, i.e. exactly like the WHEN
condition is satisfied
The results of aggregation functions (such as
Max
,Min
,Avg
) compose in possibly surprising ways with other computations.An example program that showcases the bug is:
Exception
The program terminates with the following exception:
because it is running the query
In C#
.Max<int>()
returnsint
(and throws anInvalidOperationException
, just like the iteration did 😅 ) on empty collections, but in SQLMAX()
returns aNULL
. In a way, since the exception thrown by EFCore matches the one from the.Max()
definition, this is a happy result. Unfortunately this is inconsistent with the query which also projects the results through.Select(z => z > 123 ? "T" : "F")
, which is compiled toin which no exception is thrown and
.Max()
is treated as if it returnedint?
.I think the mismatch between C# and SQL behavior is known and possibly intended/expected, but the weirdness around further computations on the result of the aggregation is unfortunate.
Include provider and version information
EF Core version: 8.0.5 Database provider: Microsoft.EntityFrameworkCore.Sqlite Target framework: .NET 8.0 Operating system: Linux (/WSL) IDE: Visual Studio Code 1.89.1