Query: Consider translating String.Equals(String, StringComparison) for selected values of StringComparison

divega commented 9 years ago

Currently we support the == operator and the String.Equals(String) methods which both have the same behavior in-memory as String.Equals(String, StringComparison.Ordinal). We could also support the latter without making significant changes and possibility other values of StringComparsion such as OrdinalIgnoreCase by applying LOWER() on both operands.

ajcvickers commented 4 years ago

@smitpatel Agree we should review EF6. Agree we should probably not do automatic changes collations.

However, I've been considering that if the behavior matches the default behavior (or at least case sensitivity) of the database (or possibly even the collation configured in the model and used by Migrations, if such a thing is created) then we allow the translation to proceed.

We throw if you specify a kind of comparison that would require the query to do something different than is the default for the column.

This doesn't add any new functionality, but allows the C# code to express an intention that the database then must conform to.

smitpatel commented 4 years ago

Default behavior of database can be changed by collation. A database can be created with either sensitivity collation. So in order to do it accurately, it has to be configured in EF Model. If we add model/migration support for collations, then we can certainly pass through methods which does the same thing. It should be tracked by collation issue.

roji commented 4 years ago

Reading up, it seems like the expectations of users is more for EF Core to add LOWER() or whatever is necessary to make the query work. I agree with @smitpatel that EF Core should not do this - we should not be implicitly doing things that prevent index usage (big pit of failure), and users can easily opt into this by explicitly adding ToLower themselves (i.e. we add very little value).

Adding support for the comparison type that happens to correspond to the database behavior would have the advantage of failing the query if that type would not be the right one (e.g. if a user accidentally says they want case-insensitive but the column is configured otherwise). But that presupposes that the user actually specify that enum value, with no other benefit than that verification and exception... I can see value there but it doesn't seem huge...

ajcvickers commented 4 years ago

Agree with both of you. It may not add enough value, and it can be tracked as part of supporting collations in Migrations. I just don't want us to completely dismiss the idea of allowing people to write code like this in their queries.

Suchiman commented 4 years ago

Those arguments may be valid for the IgnoreCase comparisons but not for the rest. In MSSQL (and most other RDBMS except for postgres AFAIK), the default collation is case insensitive, you must collate if you want compare case sensitively. This operation can even use an index if you're clever, e.g. you compare both CI and then CS, it will use the index for the CI check which should be selective enough and then evaluate the CS check as predicate.

SELECT *
FROM Table
WHERE Col = @Value
AND Col = @Value COLLATE Latin1_General_CS_AS

roji commented 4 years ago

@Suchiman FYI PostgreSQL and Sqlite both do case-sensitive comparisons by default (but MSSQL and MySQL are indeed case-insensitive).

Interesting, I can see how what you say could make sense (using a CI index for a CS lookup, and then applying CS predicates on the results). However, I can't see this actually happening on MSSQL (but I'm a bit of an MSSQL newb). Here's the code I'm using:

-- Create the schema
CREATE TABLE data (id INT, name NVARCHAR(256));
CREATE INDEX IX_foo ON data(name);

-- Insert 100000 rows - having lots of data is important as it can affect planner decisions on index usage etc., at least for some databases
BEGIN TRANSACTION;

DECLARE @i INT = 0;
WHILE @i < 50000
    BEGIN
        INSERT INTO data (name) VALUES ('hello' + CAST(@i AS NVARCHAR(256)));
        INSERT INTO data (name) VALUES ('HeLLo' + CAST(@i AS NVARCHAR(256)));
        SET @i = @i + 1;
    END;

COMMIT;

SET SHOWPLAN_ALL ON;

SELECT * FROM data WHERE name = 'hello10000';  -- Does an Index Seek, TotalSubtreeCost=0.008294476
SELECT * FROM data WHERE name = 'hello10000' COLLATE Latin1_General_CS_AS; -- Does a Table Scan, TotalSubtreeCost=0.59920794

SET SHOWPLAN_ALL OFF;

Is there something else I need to do to make this work?

Suchiman commented 4 years ago

@roji yes, you need to change the second select to do both checks 😉

SELECT * FROM data WHERE name = 'hello10000' and name = 'hello10000' COLLATE Latin1_General_CS_AS;

MSSQL isn't smart enough to do that trick automatically.

roji commented 4 years ago

@Suchiman ah yes, I missed that part in your original sample - thanks! I can see the index being utilized now. Interesting, at least in theory SQL doesn't guarantee left-to-right evaluation and short circuiting (e.g. link) but in this case that seems to be working well.

So there seems to be value in providing a translation for StringComparison.Ordinal on MSSQL specifically, producing the above SQL. As @smitpatel this would depend on collation support - #8813 seems to be the appropriate issue for tracking that (added a comment there).

Suchiman commented 4 years ago

It isn't reliant on left-to-right evaluation, e.g.

SELECT * FROM data WHERE name = 'hello10000' COLLATE Latin1_General_CS_AS and name = 'hello10000';

works just as well, but since the equality predicate is very selective according to table stats, it's an obvious choice for MSSQL to utilize that index. Since there's no index seek opportunity for the collated comparison, that needs to be evaluated as a predicate, in which case, there are less rows to evaluate after the CI index seek than when using an index scan or table scan.

roji commented 4 years ago

Good point @Suchiman, that makes sense and is a good example of non-left-to-right evaluation in SQL (and how it leads to better perf).

FWIW PG 12 introduced "non-determinstic" collations, which allow case-insensitivity - so this may actually be relevant there. The traditional pre-PG12 way to do case-insensitivity in PG is CITEXT, which AFAIK doesn't allow case-sensitive comparisons (but more research may be required).

roji commented 4 years ago

One last note: a potential difficulty is exactly which case-insensitive collation EF would generate here (i.e. why Latin1_General_CS_AS rather than something else). That might be a good reason to leave it to the user to construct this themselves and specify the collation they want.

Suchiman commented 4 years ago

Indeed, in my case, it works because Latin1_General_CI_AS is the default collation and Latin1_General_CS_AS the case sensitive counterpart. Unless EF would allow configuring the default CS and default CI collations, accidentally picking the wrong collation here would lead to incorrect results.

TrabacchinLuigi commented 4 years ago

@smarts

x => x.Foo.Equals("bar", StringComparison.OrdinalIgnoreCase) or maybe x => StringComparer.OrdinalIgnoreCase.Equals(x.Foo, "bar") was working in some version of entity framework prior to EF Core, because it was while converting to EF Core that this problem manifested for me

Yeah, the part you are missing is: you where actually downloading the whole table from the db, deserialize it into entities and THEN comparing it, so the EFCore team made you a favour, and translated that to an actual error because that is what it is: one big nasty error of yours, you just gonna notice later, when you are in production and it's late.

ajcvickers commented 4 years ago

@Suchiman @roji Great conversation!

ajcvickers commented 4 years ago

Also see high-level collations issue #19866

smarts commented 4 years ago

@TrabacchinLuigi you are just plain wrong… Given the following code:

using (var db = new TestDbContext(connectionString))
{
    var f = db.TestItems.AsNoTracking().FirstOrDefault(x => x.Status.Equals("blah", StringComparison.OrdinalIgnoreCase));
}

the generated SQL is:

SELECT TOP (1) 
    [Extent1].[my_column_id] AS [my_column_id], 
    [Extent1].[my_other_column] AS [my_other_column]
    FROM [dbo].[my_table] AS [Extent1]
    WHERE N'blah' = [Extent1].[my_other_column]

roji commented 4 years ago

@smarts which exact EF are you using? On 2.2 that query is client-evaluated (on 3.x it triggers translation failure).

TrabacchinLuigi commented 4 years ago

Yeah sure @smarts i'm so plain wrong let's do another test:

using (var db = new TestDbContext())
{
    var query = db.TestItems.AsNoTracking().Where(x => x.Status.Equals("blah", StringComparison.Ordinal));
    var f = query.FirstOrDefault();
}

f is {Id: 1, Name: "BlAh"}

Worked like a charm!

P.s. @roji looks like at some point ef6 team decided to just ignore the case option to avoid downloading the whole table or everything that matched the case to compare it later.

roji commented 4 years ago

@smarts I missed the [Extent1] in your SQL - that indeed points towards EF6 rather than EF Core. In any case we'd definitely rather throw an exception than translate incorrectly, as seems to be the case here.

smarts commented 4 years ago

@TrabacchinLuigi I think you are missing my point. I'm not (and never was) describing how EF Core works. I'm aware that if using EF Core with the code snippet I referenced then the case-insensitive property comparison would be evaluated on the client side (i.e., not in SQL). What I and others in this thread are arguing for is that with EF (*not EF Core) the code did not evaluate client side. It evaluated on the server. So, we want EF Core to be able to work the same way, translating the case-insensitive string equality expression into SQL. Does that make sense? Thanks @roji. Yes, this was using EF6 (whatever the latest version is), not EF Core

roji commented 4 years ago

@smarts

What I and others in this thread are arguing for is that with EF (*not EF Core) the code did not evaluate client side. It evaluated on the server. So, we want EF Core to be able to work the same way, translating the case-insensitive string equality expression into SQL. Does that make sense?

As a general rule, with EF Core we translate something when we can be reasonably sure that it will run on the server on the same way that it runs on the client (i.e. same results). Databases vary a great deal with regards to case-sensitivity, and even with the same database collation can be defined on a per-column basis, affecting comparison behavior. So if we translated comparisons with StringComparison.Ordinal to a simple equality in SQL (which is what you seem to be requesting), we'd return incorrect results if the column in question happens to use a case-insensitive collation (which happens to be the SQL Server default).

We also want to avoid doing anything implicitly that will prevent the database from using indexes - this is what would happen if we added LOWER on both sides of the equality, or various other solution. Users are free to do this themselves, but implicitly killing performance this way without an explicit opt-in seems like a bad idea.

I hope the above makes it clear why we can't just translate the various StringComparison options. If you still think you want a translation here, think about exactly what SQL you'd like to see, whether it would always be correct (across column collation configurations!), and whether it would be efficient (i.e. allow index use).

TrabacchinLuigi commented 4 years ago

@smarts again... It does not work on ef6 too, showed you the result the other day, that was on entity Framework 6 not core. Also, if you had read this thread there is plenty of reason why efcore should not mimic what ef6 does, first of all because it's database agnostic.

smarts commented 4 years ago

@roji I just want something that can work with the case-insensitivity checks rather than throwing an exception or trying to evaluate on the client, so I'm definitely not against an opt-in behavior rather than it being the default/implicit. I'm not familiar w/ EF Core internals so forgive me if I use the wrong terminology, but could there be some [extension] method on the DbContext options class (or options builder?) that enables an expression visitor -- which knows about some of the various string equality expressions -- and can translate them as it would a normal x.Foo == "bar" expression?

colindawson commented 4 years ago

I'm preparing for my 70-487 exam in a couple of weeks, so to make sure that I'm fully up to speed, I decided to finally take some time to learn EF. Of course, I find it best to use the latest and greatest, so opted for EF Core, especially as I'm planning on running the code on a Raspberry PI at some point.

I've just hit the same issue as described above, using EF Core 3.1.2. I think this can be solve quite easily, by allowing things to work something like this....

use the == or string.Equals(a,b) syntax to do things how the database naturally wants to work (the current implementation)

SQL Instances can have a defect collation set at install time, so the collation cannot be guaranteed without knowing in advance what the sensitivity for that database instance is. In addition, individual columns can have their collation set when the creating a table, so again, the behavior isn't guaranteed.

That said, executing a query like SELECT * FROM MyTable WHERE MyColumn = 'Hello' will give consistent results on a given database, in that the behavour is defined by the underlying metadata for that specific instance. i.e. a case sensitive column will match case sensitively, and vice versa.

Now consider the more explicit example of string.Equals( a, b, StringComparison.OrdinalIgnoreCase ) as an example. now, I'm specifying that it's case insensitive, so the query can easily be modified to include the collation in the where clause, which defines the behavior explicitly i.e. SELECT * FROM MyTable WHERE MyColumn = 'Hello' COLLATE Latin1_General_CI_AS
in the case of StringComparison.Ordinal a case sensetive collation can be applied instead

SELECT * FROM MyTable WHERE MyColumn = 'Hello' COLLATE Latin1_General_CS_AS

The exact form of the SQL will depend on the RDBMS in use as the syntax does vary between systems.

I do feel that this implementation will provide a good enhancement to something that I feel is rather hamstrung at the moment. p.s. I've been working with databases from multiple vendors for the over 25 years and have resisted switching to EF for reasons of missing features like this. I can see why EF could be a fantastic way of developing database agnostic software, however omissions like not being able to perform an explicit case sensitive lookup leave me once again wondering if giving up control of the query language is a good idea afterall.

roji commented 4 years ago

@smarts

I just want something that can work with the case-insensitivity checks rather than throwing an exception or trying to evaluate on the client

I may be missing something, but if you know your column is case-insensitive, than you can just use the regular C# equality operator, EF Core will generate a regular SQL equality operator, and everything will just work... If I'm misunderstanding you, can you please clearly state what C# you want to be translated into what SQL?

I'm not familiar w/ EF Core internals so forgive me if I use the wrong terminology, but could there be some [extension] method on the DbContext options class (or options builder?) that enables an expression visitor -- which knows about some of the various string equality expressions -- and can translate them as it would a normal x.Foo == "bar" expression?

Here you seem to be asking for string.Equals with all different StringComparison values to be translated to a simple equality expression in the database. That means that if I specify StringComparison.Ordinal on SQL Server (which is case-insensitive by default) or StringComparison.OrdinalIgnoresCase on PostgreSQL (which is case-sensitive by default), I get very unexpected (read: wrong) results...

roji commented 4 years ago

@colindawson this was already proposed by @Suchiman above (https://github.com/dotnet/efcore/issues/1222#issuecomment-582786868). The main difficulty here would be knowing which case-sensitive (or insensitive) collation to use, as I wrote here: https://github.com/dotnet/efcore/issues/1222#issuecomment-582883721.

roji commented 4 years ago

After further discussion in https://github.com/npgsql/efcore.pg/pull/1329, we've decided not to implement this in 5.0 - but may revisit later based on request. We are already going to bring some substantial collation support in 5.0 (#6577, #19275, #8813), and we believe most needs would be addressed by those.

Reasons why this feature is problematic:

We'd need to know which case-insensitive/sensitive collation to use, there are many.
We could allow users to define in advance which collation to use when an explicitly case-insensitive/sensitive operation is used. Allowing this on the column raises various issues (see https://github.com/npgsql/efcore.pg/pull/1329#pullrequestreview-387064323) and it doesn't seem possible to do something both accurate and not too restricted.
Allowing the user to do define collations for explicitly case-insensitive/sensitive operations only at the global database level. This doesn't have the issues of column definitions.
However, we think that database/column collations and EF.Functions.Collate() should address the needs of the vast majority of people. We can revisit this in the future if those are insufficient.

roji commented 4 years ago

In addition, as discussed, we should identify these untranslatable overloads (with StringComparison), and fail with guidance on how to do, linking to docs (https://github.com/dotnet/EntityFramework.Docs/issues/2273).

smarts commented 4 years ago

@roji

I may be missing something, but if you know your column is case-insensitive, than you can just use the regular C# equality operator, EF Core will generate a regular SQL equality operator, and everything will just work... If I'm misunderstanding you, can you please clearly state what C# you want to be translated into what SQL?

Yes, but I also want this to work in unit test scenarios that mimic the real data (i.e., use the same query w/ an in-memory IQueryable<T> instance where the data in the in-memory collection might have some lowercase, some mixed case, some uppercase). Being able to specify the expression as a case-insensitive comparison enables this scenario, but we still don't want the entire query to be evaluated client-side, so we need support for the aforementioned expression at the query translation level).

Here you seem to be asking for string.Equals with all different StringComparison values to be translated to a simple equality expression in the database. That means that if I specify StringComparison.Ordinal on SQL Server (which is case-insensitive by default) or StringComparison.OrdinalIgnoresCase on PostgreSQL (which is case-sensitive by default), I get very unexpected (read: wrong) results...

Regarding your first statement: yes, you are understanding my ask correctly. You would get different results based on the underlying SQL engine, and I agree this results in unexpected results. However, I think this can be mitigated with naming & documentation, and also by way of being an opt-in feature.

Even if this isn't making the cut for something that goes into the library, would it be possible to receive some guidance on how to create such a feature? This is one of the last things for us that is preventing a switch from the EF to EF Core.

roji commented 4 years ago

@smarts we definitely won't be translating StringComparison.Ordinal to a simple SQL equality operator, since that would return wrong results on case-insensitive databases. If a user specifies Ordinal comparison, they need to get back results for ordinal (case-sensitive) comparisons; I definitely don't think it makes sense to compromise on that because of a testing need.

First, I'd suggest considering switching SQL Server to be case-sensitive by specifying a case-sensitive database collation (see #6577). Assuming this fits your actual production needs, it seems like it would be the ideal solution: have both the database and InMemory in case-sensitive mode, and use the simple equality operator which would behave the same way with regards to case-sensitivity.

Second, this is another way in which InMemory is not really suitable for testing apps that use EF Core - this is something we discourage - there are many ways in which InMemory behavior simply does not match that of a real database (e.g. lack of transactions). We generally recommend testing against your real production database, and have recently docs on how to do that efficiently, so that tests don't take too long. FWIW EF Core itself has a huge number of tests which execute against the actual database - this is reasonably efficient. Another option is to use Sqlite.

Finally, if you insist on using InMemory and do not want to change your SQL Server collation, you could try to make InMemory case-insensitive by adding a query preprocessing visitor which replaces simple string equality with StringComparison.OrdinalsIgnoreCase on InMemory. To do this completely, it needs to be done for all string operations, not just equality (e.g. StartsWith, Contains...).

smarts commented 4 years ago

As for the first option: No, that's not desirable. I want the test side to act like SQL Server, not the other way around. So, yes, the 3rd option is the one I'd want, and that's what I was asking for guidance on implementing. You referred to "preprocessing visitors". @roji are there any existing classes/examples to which you can point me? Also, you specifically said "make InMemory case-insensitive". Are you saying that implementing a preprocessing visitor would only affect the InMemory database? Can you please elaborate a bit on this?

TrabacchinLuigi commented 4 years ago

could you two go chat in a private room ? exchange your phone numbers and call each other, this thread isn't about what smarts want and can't achieve.

roji commented 4 years ago

@smarts you basically need to replace the preprocessor factory service, and add your own expression visitor which locates strings comparisons and replaces them. For example, you can use this to convert string comparisons with InMemory string.Compare(a, b, StringComparison.OrdinalsIgnoreCase). There are various resources out there on how to write expression visitors in .NET, it shouldn't be too hard.

But I'll say again that I think this isn't a good approach in the long term - you're trying to twist InMemory to behave like your real database, which it will never fully do.

Wiring up a custom visitor at the beginning of preprocessing

```c# public class BlogContext : DbContext { protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder) => optionsBuilder .UseSqlServer(...) .ReplaceService(); } public class MyQueryTranslationPreprocessorFactory : IQueryTranslationPreprocessorFactory { private readonly QueryTranslationPreprocessorDependencies _dependencies; private readonly RelationalQueryTranslationPreprocessorDependencies _relationalDependencies; public MyQueryTranslationPreprocessorFactory( [NotNull] QueryTranslationPreprocessorDependencies dependencies, [NotNull] RelationalQueryTranslationPreprocessorDependencies relationalDependencies) { _dependencies = dependencies; _relationalDependencies = relationalDependencies; } public virtual QueryTranslationPreprocessor Create(QueryCompilationContext queryCompilationContext) => new MyQueryTranslationPreprocessor(_dependencies, _relationalDependencies, queryCompilationContext); } public class MyQueryTranslationPreprocessor : RelationalQueryTranslationPreprocessor { public MyQueryTranslationPreprocessor(QueryTranslationPreprocessorDependencies dependencies, RelationalQueryTranslationPreprocessorDependencies relationalDependencies, QueryCompilationContext queryCompilationContext) : base(dependencies, relationalDependencies, queryCompilationContext) { } public override Expression Process([NotNull] Expression query) { query = new SomeQueryTransformingExpressionVisitor().Visit(query); // You need to implement this visitor return base.Process(query); } } ```

smarts commented 4 years ago

@roji I totally understand your point. Changing the DB collation is not going to happen, but migrating to SQLite might be possible as a long term solution. In the meantime, thank you so much for your guidance on this [hopefully] short-term solution.

vihorlat commented 4 years ago

I have just found out that

await _database.Set<MyEntity>().AsNoTracking().Where(i => i.RegCis == someString).AnyAsync() behaves as expected, EF Core logs:

SELECT CASE
    WHEN EXISTS (
        SELECT 1
        FROM [MyEntity] AS [i]
        WHERE [i].[RegCis] = @__someString)
    THEN CAST(1 AS BIT) ELSE CAST(0 AS BIT)
END

But folowing: await _database.Set<MyEntity>().AsNoTracking().Where(i => string.Equals(i.RegCis, someString)).AnyAsync() surprisingly generates foillowing log! SELECT [i].[RegCis] FROM [MyEntity] AS [i] And also, execution takes very long time (300 times longer than the correct query), so the SQL is definitely incorrect !

my csproj:

    <PackageReference Include="Microsoft.AspNetCore" Version="2.0.2" />
    <PackageReference Include="Microsoft.AspNetCore.Identity.EntityFrameworkCore" Version="2.0.2" />
    <PackageReference Include="Microsoft.EntityFrameworkCore.SqlServer" Version="2.0.2" />
    <PackageReference Include="Microsoft.EntityFrameworkCore.Tools" Version="2.0.2" />

ajcvickers commented 4 years ago

@vihorlat EF Core 2.0 is out-of-support. Please update to EF Core 3.1 and file a new issue if it still fails attaching a small, runnable project or posting a small, runnable code listing that reproduces what you are seeing so that we can investigate.

Saibamen commented 3 years ago

Workaround:

Instead of: FirstOrDefault(b => b.X.Equals(Y, StringComparison.InvariantCultureIgnoreCase)) use ToLower() to simulate comparison with ignoring case: FirstOrDefault(b => b.X.ToLower() == Y.ToLower())

Warning: ToLowerInvariant() also not working here

I know this not contains invariant culture, but this is better than waiting for EF Core 6 or 7, staying on old EF 6 or rewriting all API consumers (if we gets rid of comparison and use only FirstOrDefault(b => b.X == Y))

roji commented 3 years ago

@Saibamen it's recommended to look into column collations rather than calling ToLower. With collations, an index defined over your column can be used.

PMExtra commented 3 years ago

https://github.com/PomeloFoundation/Pomelo.EntityFrameworkCore.MySql/blob/bd1f8369e73a2555eef989e0c4503d84a5180fa0/src/EFCore.MySql/Query/ExpressionTranslators/Internal/MySqlStringComparisonMethodTranslator.cs#L542-L555

I think Pomelo.EntityFrameworkCore.MySql has already do it.

PMExtra commented 3 years ago

In some provider such as PostgreSQL, I think InvariantCultureIgnoreCase should be map to ILIKE for better performance.

roji commented 3 years ago

@PMExtra this issue is about String.Equals, where PostgreSQL ILIKE does pattern matching with wildcards (like LIKE), so that doesn't seem right. You can already use collations with EF Core 5.0 to make PostgreSQL insensitive equality operations.

PMExtra commented 3 years ago

@roji But we can use LIKE or ILIKE without wildcards, So the question is, which is the better performance between TOLOWER(@param) == TOLOWER(field) and field ILIKE @param?

roji commented 3 years ago

@PMExtra I'm not aware of a way to use LIKE/ILIKE without wildcards. If the pattern is constant, we can indeed escape all the wildcards (this is what we currently do for other cases), but not when the pattern is a parameter or column.

More generally, I suggest following the discussing from the beginning to understand why I think the recently-introduced collations functionality is the better way to go. Regardless, if users prefer either TOLOWER or ILIKE, they can already express that themselves.

IanKemp commented 3 years ago

The whole .NET StringComparison vs DB collation pain is yet another reminder to the unwary that clients and databases work very differently, and that databases are still about two decades behind the state-of-the-art (exhibit A: SQL Server's default collation is still case-insensitive). EF.Functions.Collate is at best a band-aid over this gaping dichotomy.

And it doesn't solve the problem that users can write EF code that uses StringComparison and said code will only fail at runtime. The in-memory EF provider doesn't barf on StringComparison, which means the only way to find out you've misused StringComparison is to integration or manually test your code. That's a problem.

At the very least the EF Core team should add a Roslyn analyzer for StringComparison to detect and inform users at compile-time. I would also like to see the in-memory provider updated to fail when it encounters StringComparison. Another useful Roslyn analyzer would be one that warns about explicitly calling ToLower() and ToUpper() inside EF expressions, as those are unlikely to have the desired effect with DB collations...

roji commented 3 years ago

The whole .NET StringComparison vs DB collation pain is yet another reminder to the unwary that clients and databases work very differently, and that databases are still about two decades behind the state-of-the-art (exhibit A: SQL Server's default collation is still case-insensitive). EF.Functions.Collate is at best a band-aid over this gaping dichotomy.

FWIW this isn't about databases "being behind", but about these operations execute in a very different way in the database and in a regular program. The important point here is that databases natively use indexes, which is why the collation must be set up in advance in your schema, and queries need to match that collation; whereas a regular .NET program simply goes over a list of strings in memory, and can do whatever type of comparison it wants. This is precisely why StringComparison makes sense in a .NET program, but not when translating a query to SQL.

Re the rest, there's absolutely no guarantee that queries which work on InMemory will work against a real database - this is very much by-design. The specific constructs and functions that are supported vary across databases (SQL Server, PostgreSQL and Sqlite support very different things), and InMemory simply cannot mimic a real database.

There have been various requests in the past to have an analyzer which detects queries which would fail at runtime - unfortunately this simply isn't feasible for most cases. Specifically for your proposal, consider that StringComparison does make sense on some providers even if it doesn't on most relational databases: here's a proposal to translate it for Cosmos. In other words, the analyzer would have to somehow know - at compile-time - which provider is being targeted, and that's not really possible.

At the end of the day, a robust test suite is the only way to be sure that your queries execute properly, and continue executing properly.

IanKemp commented 3 years ago

There have been various requests in the past to have an analyzer which detects queries which would fail at runtime - unfortunately this simply isn't feasible for most cases. Specifically for your proposal, consider that StringComparison does make sense on some providers even if it doesn't on most relational databases: here's a proposal to translate it for Cosmos. In other words, the analyzer would have to somehow know - at compile-time - which provider is being targeted, and that's not really possible.

Unnecessary. Simply write an analyzer that assumes that people are using a database that supports nothing and should therefore be warned about everything, make it an add-on NuGet package, and allow end-users to ignore the warnings that don't fit their DBMS and/or use-case using the mechanisms that exist in .NET today.

At the end of the day, a robust test suite is the only way to be sure that your queries execute properly, and continue executing properly.

Most people would consider a test suite using the in-memory provider to be sufficiently robust, and yet it demonstrably isn't in this case. The in-memory provider already throws by default when it encounters transactions, behaviour which can be disabled via ConfigureWarnings; I see no good reason why the same cannot be implemented for explicit ToLower/ToUpper calls, and calls containing StringComparison.

roji commented 3 years ago

Unnecessary. Simply write an analyzer that assumes that people are using a database that supports nothing and should therefore be warned about everything, make it an add-on NuGet package, and allow end-users to ignore the warnings that don't fit their DBMS and/or use-case using the mechanisms that exist in .NET today.

That sounds like a pretty bad user experience, and would also not work for applications using multiple providers (which have different capabilities). And once again, predicting at compile-time whether a query will execute successfully is simply not feasible; for the specific case of StringComparison it does seem easy to detect the specific invocation, but in many other cases translatibility depends on more complex factors (what exact parameter is being used, where in the query is the construct/function being used...). And since translatibility can't be successfully predicted in the general case, users have to write tests anyway to ensure their code works.

Stepping back, I understand the initial frustration of writing a query using StringComparison and getting an exception - but that's part of learning how an API works when first starting to use it. LINQ and queries are maybe a bit tougher, since they're an open space where query shapes are basically infinite. But if you try using StringComparison, we already include a very specific message pointing you to the doc page about collations - that seems reasonable enough. Not every issue in programming can be flagged at compile-time.

Most people would consider a test suite using the in-memory provider to be sufficiently robust, and yet it demonstrably isn't in this case.

We very explicitly discourage testing EF Core applications with InMemory, for precisely this kind of thing. InMemory cannot and will never behave like your real database; either create a repository layer above EF and mock that (in which case you provide results directly and don't need InMemory), or test against your production database system (in which case you get full fidelity between test and production).

The in-memory provider already throws by default when it encounters transactions, behaviour which can be disabled via ConfigureWarnings; I see no good reason why the same cannot be implemented for explicit ToLower/ToUpper calls, and calls containing StringComparison.

The difference is that the InMemory provider can support ToLower/ToUpper/StringComparison, whereas it cannot support transactions or raw SQL. And once again, ToLower/ToUpper/StringComparison may make sense in certain other production providers.

IanKemp commented 3 years ago

Not every issue in programming can be flagged at compile-time.

But this particular one can, and relatively easily as you yourself have already agreed. "We can't do it for every case" should not imply "we won't do it for a single simple case", in the same way that perfect should not be the enemy of good enough.

The in-memory provider already throws by default when it encounters transactions, behaviour which can be disabled via ConfigureWarnings; I see no good reason why the same cannot be implemented for explicit ToLower/ToUpper calls, and calls containing StringComparison.

The difference is that the InMemory provider can support ToLower/ToUpper/StringComparison, whereas it cannot support transactions or raw SQL. And once again, ToLower/ToUpper/StringComparison may make sense in certain other production providers.

Which is why you should be able to configure it. Default behaviour:

services.AddDbContext<MyDbContext>(o => o
            .UseInMemoryDatabase("blah")
            .ConfigureWarnings(w => 
                {
                    w.Ignore(InMemoryEventId.UseOfToLowerOrToUpper);
                    w.Ignore(InMemoryEventId.UseOfToStringComparison);
                });

Someone like me using MSSQL in prod:

services.AddDbContext<MyDbContext>(o => o
            .UseInMemoryDatabase("blah")
            .ConfigureWarnings(w => 
                {
                    w.Throw(InMemoryEventId.UseOfToLowerOrToUpper);
                    w.Throw(InMemoryEventId.UseOfToStringComparison);
                });

roji commented 3 years ago

Not every issue in programming can be flagged at compile-time.

But this particular one can, and relatively easily as you yourself have already agreed. "We can't do it for every case" should not imply "we won't do it for a single simple case", in the same way that perfect should not be the enemy of good enough.

I disagree - this would give the illusion that InMemory does approximate the SQL Server behavior; if it catches this specific SQL Server incompatibility, the expectation/assumption would be that it catches in general. It's better to say loud and clear that "this provider does not behave like SQL Server", then to make it match SQL Server on one very small bit. Once again, users must properly test their queries on their production database system, so it's better not to give any illusions to the contrary.

Which is why you should be able to configure it. Default behaviour:

This is not scalable, given the number of other little behavior differences between InMemory and any/all other provider. There would have to be hundreds/thousands of flags controlling every possible imaginable language construct and function.

It also does not help users, since it puts the onus on them to tweak their InMemory configuration to match the SQL Server behavior. A solution that would require every user to fully research which function is supported on their provider, and tweak InMemory accordingly doesn't make sense to me.

dahlbyk commented 1 year ago

Some folks here may have thoughts on https://github.com/dotnet/roslyn-analyzers/issues/6743

dotnet / efcore

Query: Consider translating String.Equals(String, StringComparison) for selected values of StringComparison #1222