Investigate alternative translations for (in)equality comparison

dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.

https://docs.microsoft.com/ef/

MIT License

13.65k stars 3.15k forks source link

Investigate alternative translations for (in)equality comparison #34165

Open ranma42 opened 2 months ago

ranma42 commented 2 months ago

The current translation of nullableA == B is nullableA = B AND nullableA IS NOT NULL. Similarly, nullableA != B is translated to nullableA <> B OR nullableA IS NULL.

This causes the duplication of the nullableA expression.

Alternative translations that could avoid this issue are CASE WHEN nullableA == b THEN TRUE ELSE FALSE END / CASE WHEN nullableA == b THEN FALSE ELSE TRUE END.

ranma42 commented 2 months ago

~~Note that this would not fix the issue of double evaluation on SqlServer unless #32519 is fixed.~~

EDIT: this was referring to the older alternative translation COALESCE(nullableA = B, FALSE) / COALESCE(nullableA <> B, TRUE).

ranma42 commented 2 months ago

⚠️ while this alternative translation is interesting to avoid duplication of sub-expressions, it might cause worse plans to be generated, for example when nullableA is a simple column expression.

For example Sqlite could use indexes on a predicates like nullableA <> B OR nullableA IS NULL / nullableA = B AND nullableA IS NOT NULL.

ranma42 commented 2 months ago

For example Sqlite could use indexes on a predicates like nullableA <> B OR nullableA IS NULL [...]

The index would not be useable for the nullableA <> B, so for the inequality it looks like a safe transformation. For equality, more care might be needed. The same holds true for other comparisons; see https://github.com/dotnet/efcore/pull/33757/commits/4db69e0b3178c83db081d358302a0b7c8737edd2 where the same CASE structure is used (and the same issues with indexes would be hit; on the bright side, that translation is only used when !optimize, aka outside of predicates).

roji commented 2 months ago

The current translation of nullableA == B is nullableA = B AND nullableA IS NOT NULL.

This only referes to "non-optimized" mode, right?

on the bright side, that translation is only used when !optimize, aka outside of predicates).

Note that !optimize does occur in predicate, e.g. within negation.

In any case, doesn't the switch to IS NOT DISTINCT FROM help here?

ranma42 commented 2 months ago

The current translation of nullableA == B is nullableA = B AND nullableA IS NOT NULL.

This only referes to "non-optimized" mode, right?

Yes (for the !=/OR case, instead, we always emit the nullableA <> B OR nullableA IS NULL).

on the bright side, that translation is only used when !optimize, aka outside of predicates).

Note that !optimize does occur in predicate, e.g. within negation.

In any case, doesn't the switch to IS NOT DISTINCT FROM help here?

Yes, IS NOT DISTINCT FROM is a better solution, but requires explicit support from the DB provider.