dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.65k stars 3.15k forks source link

The behavior of Distinct in SqlServer and Cosmos is inconsistent #26547

Open Varorbc opened 2 years ago

Varorbc commented 2 years ago

https://docs.microsoft.com/en-us/dotnet/api/system.linq.queryable.distinct?view=net-6.0 https://docs.microsoft.com/en-us/ef/core/what-is-new/ef-core-6.0/whatsnew#distinct-queries

Include your code

[Table("Table_1")]
public partial class Table1
{
    [Key]
    public int Id { get; set; }

    public DateTime CreateTime { get; set; }
}

public partial class TestContext : DbContext
{
    public TestContext()
    {
    }

    public TestContext(DbContextOptions<TestContext> options)
        : base(options)
    {
    }

    public virtual DbSet<Table1> Table1 { get; set; }
}

var services = new ServiceCollection();
services.AddDbContext<TestContext>();
var serviceProvider = services.BuildServiceProvider();

var testContext = serviceProvider.GetRequiredService<TestContext>();
var sql = testContext.Table1.OrderBy(a => a.Id).Distinct().ToQueryString();
Console.WriteLine(sql);

Sql Server Behavior

SELECT DISTINCT [t].[Id], [t].[CreateTime]
FROM [Table_1] AS [t]

Cosmos Behavior

SELECT DISTINCT c
FROM root c
WHERE (c["Discriminator"] = "Table1")
ORDER BY c["Id"]

Include provider and version information

EF Core version:6.0.0-rc.2.21480.5 Database provider: Microsoft.EntityFrameworkCore.SqlServer/Microsoft.EntityFrameworkCore.Cosmos Target framework: .NET 6.0 Operating system:Win11 IDE: Visual Studio 2022 17.0

roji commented 2 years ago

@Varorbc providers aren't necessarily meant to behave in the same way, around this question or around others - it's perfectly OK to have different behaviors across providers when the database itself warrants that.

As discussed in #25696, the LINQ documentation for the Distinct operator clearly specifies that "the result sequence is unordered". And as discussed in that issue, one main issue in relational databases is that Distinct frequently causes queries to be pushed down into subqueries, but in SQL, subqueries return unordered resultsets; so if we can't guarantee ordering after Distinct in some/most cases, we generally shouldn't in any case (otherwise behavior is inconsistent depending on whether there's a SQL subquery or not).

I don't know if the same is true for Cosmos or not - it's a different database with different behavior; for one thing, uncorrelated subqueries aren't supported at all. So we'd have to investigate the Cosmos behavior more thoroughly to see what makes sense there.

Having said all that, I do agree this is something we should look into.

John0King commented 2 years ago

@roji

And as discussed in that issue, one main issue in relational databases is that Distinct frequently causes queries to be pushed down into subqueries, but in SQL, subqueries return unordered resultsets;

can you explain more or a sample ?

and follow sql work perfectly for sqlserver

  SELECT DISTINCT <tb_filed> FROM <table>  ORDER BY <tb_filed> DESC
roji commented 2 years ago

@John0King see #25696, this is discussed there (e.g. https://github.com/dotnet/efcore/issues/25696#issuecomment-912347569)

roji commented 2 years ago

SELECT DISTINCT <tb_filed> FROM <table> ORDER BY <tb_filed> DESC

@John0King note that I'm discussing subqueries

John0King commented 2 years ago

note that I'm discussing subqueries

😄 I am looking for a sample that do not work with discussing and subquerys, so I can understand you more clearly.

so far , it seems orderby first and then distinct is acceptable , and it even reasonable for C#

> var hashSet = new HashSet<int>();
. for (var i = 0; i < 10; i++)
. {
.     hashSet.Add(i);
. }
. 
. hashSet.Remove(2);
. hashSet.Add(1000);
. 
. hashSet.OrderBy(x=>x).Distinct().ToArray()
// int[10] { 0, 1, 3, 4, 5, 6, 7, 8, 9, 1000 }

> hashSet.OrderByDescending(x=>x).Distinct().ToArray()
// int[10] { 1000, 9, 8, 7, 6, 5, 4, 3, 1, 0 }
roji commented 2 years ago

@John0King please fully read #25696, this has already been discussed there. Even if in LINQ to Objects Distinct "happens" to return ordered items (at least in some cases), the documentation clearly specifies that this isn't in the contract, so it could change.

AndriySvyryd commented 2 years ago

Related info: https://docs.microsoft.com/en-us/azure/cosmos-db/sql/sql-query-keywords#distinct

The below queries are unsupported:
ORDER BY clause in the outer query
    SELECT VALUE COUNT(1) FROM (SELECT DISTINCT VALUE c.lastName FROM c) AS lastName ORDER BY lastName
roji commented 2 years ago

Note to self: check specifically the behavior of ordering in the inner query (which contains distinct) as well as the outer query.