dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.79k stars 3.19k forks source link

Query optimization compares case sensitively while SqlServer is case insensitive? #34862

Open kopfsick opened 1 month ago

kopfsick commented 1 month ago

I just updated from EF 7.0.10 on .NET6 -> 8.0.8 on .NET8 and noticed a small, yet undocumented breaking change. We use Sql Server and I have not tested this with other providers.

What happen is if you have a query that includes multiple string equality checks with AND:

Context.Things.Where(thing => thing.Name.Equals("TheName") && thing.Name.Equals("tHeNaMe"));

Notice that the to clauses are the same string if compared case insensitive. I am aware that this query seems weird, but when the query is dynamically built from user input, this can happen.

However, this worked fine with EF 7 and produced SQL query like

...[e].[Name] = N'TheName' AND [e].[Name] = N'tHeNaMe'

Since SQL Server treats the string comparisons case insensitevely this query will return all 'Things' that have a name like "TheName", no matter the casing.

But in EF 8 the same query produces this SQL query:

...WHERE 0 = 1

which obviously doesn't return anything.

If we remove one of the 'Equals' clauses or make both clauses have the same casing, it still works. It seems to me like EF8 is looking at the query, comparing the two 'Equals' clauses with case sensitive comparing and concluding that this is contradictory and just makes SQL query to terminate quickly.

Is this change intended? If so, it should be documented as breaking change.

EF Core version: 8.0.8 Database provider: Microsoft.EntityFrameworkCore.SqlServer Target framework: .NET 8.0 Operating system: Windows 11 IDE: Rider 2024.2

roji commented 1 month ago

Confirmed, see repro below - thanks for reporting. tl;dr we're simplifying the SQL tree based on the assumption that equailty is evaluated in the same way in .NET and in the database, which is incorrect.

This happened because of the change in SqlExpressionSimplifyingExpressionVisitor in #31046 (code). That piece of code converts equality comparisons on the same thing, and convert them to IN (e.g. a == b && a == c to a IN (b, c)); however, along the way it also attempts to detect incompatible conditions (e.g. where b and c are different), by doing an intersection. This means we use .NET equality (via Intersect), which does not correspond to database equality (e.g. around case-sensitivity). Note that #31046 improved the simplification to apply to more cases, which is why this behaved differently; but the principle was the same before, so the actual change #31046 is immaterial here.

@ranma42 note that regardless of this bug, SqlExpressionSimplifyingExpressionVisitor basically does an inferior inferior version of #34507 (collapse multiple ORs into a single IN); the optimization only works if the two equalities are inside the same OR node - if they're farther apart in the tree this won't work (whereas in #34506 we'd have visitor state tracking known compared values etc. Note also that that it only simplifies where for ColumnExpression, whereas we should be simplifying for any expression (e.g. foo(x) = 3 OR foo(x) = 4 should be simplified to foo(x) IN (3, 4), even though foo(x) isn't a column).

To summarize, this looks like a bug to me - we shouldn't optimize things based on .NET equality. However, since it's quite contrived and isn't likely to affect many people, we should consider simply reimplementing the whole simplification in #31046, and fixing this as part of that...

/cc @maumar (I think you wrote the original simplification logic in SqlExpressionSimplifyingExpressionVisitor)

Minimal repro ```c# await using var context = new BlogContext(); await context.Database.EnsureDeletedAsync(); await context.Database.EnsureCreatedAsync(); var thing = new Thing { Name = "TheName" }; _ = await context.Things .Where(thing => thing.Name.Equals("TheName") && thing.Name.Equals("tHeNaMe")) .ToListAsync(); public class BlogContext : DbContext { public DbSet Things { get; set; } protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder) => optionsBuilder .UseSqlServer("Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Encrypt=false") .LogTo(Console.WriteLine, LogLevel.Information) .EnableSensitiveDataLogging(); } public class Thing { public int Id { get; set; } public string Name { get; set; } } ```
kopfsick commented 1 month ago

Thank you for your quick response and for confirming 👍 As you say, this isn't likely to affect many people, and for those that it does affect, like us, it is probably a rare corner case. I've temporarily adjusted some test cases, but will be following the issue.