dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.66k stars 3.16k forks source link

Look into IS DISTINCT FROM as an alternative to our current null compensation scheme #29624

Open roji opened 1 year ago

roji commented 1 year ago

SQL Server 2022 has support for IS [NOT] DISTINCT FROM (docs); this is an equality check that treats two nulls as equal (similar to how DISTINCT works in SQL queries).

PostgreSQL has had this for a long time (docs), but last time I checked it didn't utilize indexes. If the SQL Server version does, then we can use this as a terser alternative to the equality operator, which doesn't require null compensation.

roji commented 9 months ago

Note that especially for complex/expensive expressions, this has the potential to eliminate the duplicate inherent in null compensation, i.e. instead of WHERE complex1 = complex2 OR complex1 IS NULL AND complex2 IS NULL (where complex1 and complex2 are evaluated several times), you get WHERE complex1 IS NOT DISTINCT FROM complex2.

clement911 commented 6 months ago

Hi @roji I was watching your video on 2nd level caching and my understand is that it is only exists because different SQL needs to be generated depending on whether a parameter is NULL or not.

If the sql server provider can leverage IS DISTINCT FROM, the generated sql would be the same regardless. Does it mean that the sql server provider could then bypass the 2nd level cache?

roji commented 6 months ago

@clement911 unfortunately not - equality is just one of many SQL expression types for which we ned to perform nullability-related rewriting. Take a look at SqlNullabilityProcessor to get an idea of the type of things this involves.

IS DISTINCT FROM can still potentially cut down a lot of null compensation, which may significantly improve performance. But note also that we need to carefully check how it performs (especially with regards to index usage) compared to normal equality.

ranma42 commented 3 months ago

Modern versions of SQLite have the standard SQL syntax (i.e. IS [NOT] DISTINCT FROM) in addition to the IS operator that is also available on older ones. According to the documentation, IS constraints can be used for indexing.

MySQL has the <=> operator. I have not been able to find documentation on whether it makes good use of indexes.

Related: https://github.com/dotnet/efcore/issues/10514

roji commented 3 months ago

@ranma42 I may be missing it, but I can't see mention of index usage specifically for IS DISTINCT FROM in the SQLite docs.

I definitely believe IS DISTINCT FROM is a promising improvement, and we can add support for it in relational and allow specific providers to opt into it where that makes sense. But careful investigation needs to happen for each databases to ensure whether indexes are used etc.

ranma42 commented 3 months ago

@ranma42 I may be missing it, but I can't see mention of index usage specifically for IS DISTINCT FROM in the SQLite docs.

I think indexes are not used for IS DISTINCT FROM, just like they are not used for <>. Conversely, IS is explicitly mentioned as supported:

To be usable by an index a term must usually be of one of the following forms: column = expression column IS expression column > expression column >= expression column < expression column <= expression expression = column expression IS column [...]

also the docs states that

The IS NOT DISTINCT FROM operator is an alternative spelling for the IS operator.

which makes me believe they are treated the same way internally. I would still recomend emitting the IS syntax, as it supports a wider range of versions:

I definitely believe IS DISTINCT FROM is a promising improvement, and we can add support for it in relational and allow specific providers to opt into it where that makes sense. But careful investigation needs to happen for each databases to ensure whether indexes are used etc.

Yes, a "strict equality" is definitely an interesting option; its support (both in term of syntax and performance) seems to be very different across providers, so that is something each provider might have to decide how to handle (and in some cases different choices even within the same provider, as SQL Server < 2022 does not seem to support IS DISTINCT FROM).

Another option worth investigating is CTE, which would be more general, but I am afraid it would also involve more extensive changes.

ranma42 commented 3 months ago

https://github.com/ranma42/efcore/tree/is-not-distinct-from is a WIP branch that explores what happens to the test suite baselines when using IS to RewriteNullSemantics of [Not]Equal. Some care is already taken to avoid regressing good rewrites, but other parts mentioned above are still missing. The main issue (I think) is that it is hardcoded to always be active; instead it should be:

roji commented 3 months ago

the IS operator drives indexes since 3.8.11

I missed this original comment - this indeed seems to mean that we should seriously consider switching to IS in SQLite instead of compensating with additional IS NULL checks... For other databases, do you happen to know for sure whether SQL Server uses indexes for IS NOT DISTINCT FROM? I've never actually checked this. Similarly, for PG I checked a very long time ago, but things may very well have changed (see https://github.com/npgsql/efcore.pg/issues/28 from 2016...).

One additional thought is that we should consider the same change for the negated case (i.e. translate <> to IS DISTINCT FROM). This seems more straightforward, as <> generally doesn't uses indexes as you mentioned (but we should still confirm this across databases).

@ranma42 can you please submit your branch as a draft PR rather than as a branch? This way we can have a conversation around it more easily (it's totally fine if it's still an experiment etc.).

ErikEJ commented 3 months ago

Looks like IS DISTINCT FROM enables index seeks: https://sqlperformance.com/2022/08/sql-server-2022/additional-t-sql-improvements-in-sql-server-2022

roji commented 3 months ago

Looks like IS DISTINCT FROM enables index seeks [...]

Perfect, thanks @ErikEJ! So it looks like IS DISTINCT FROM should be a good general choice replacement to compensation on SQL Server 2022! And as @ranma42 mentioned in a recent conversation, compared to the current compensation it also has the advantage of not duplicating the expressions (which may be expensive scalar subqueries etc.). I'll reach out internally just to confirm that there aren't any gotchas, but otherwise I think we can definitely consider doing this. Maybe, as this is quite a fundamental query change, we'd want to preemptively include some sort of opt-out, just in case there's trouble.

FYI I've opened https://github.com/npgsql/efcore.pg/issues/3206 to track this on the EFCore.PG side.