dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.77k stars 3.18k forks source link

False positive "Skip/Take without OrderBy" when using SplitQuery #23803

Closed NatanBagrov closed 3 years ago

NatanBagrov commented 3 years ago

When using .UseQuerySplittingBehavior(QuerySplittingBehavior.SplitQuery) we get a the following warning

The query uses a row limiting operator ('Skip'/'Take') without an 'OrderBy' operator. This may lead to unpredictable results.

even without really applying and row limiting operation.

Repro: Configuration:

p_serviceCollection.AddDbContext<T>(
    options =>
    {
        options.ConfigureWarnings(w =>
            w.Throw(CoreEventId.RowLimitingOperationWithoutOrderByWarning)); // Configured to throw instead of warn
        options.UseSqlServer(Configuration[$"Database:{p_dbName}:ConnectionString"],
            b =>
            {
                b.UseQuerySplittingBehavior(QuerySplittingBehavior.SplitQuery); // TOGGLE THIS BY COMMENTING
                b.MigrationsAssembly("Application");
            });
    });

Query:

var user = await m_context.AccountUsers
    .Include(u => u.ProfilePicture)
    .Include(u => u.EmailDetails)
    .Include(u => u.UserGroups)
    .ThenInclude(ug => ug.AccountGroup)
    .SingleOrDefaultAsync(u => u.Id == p_userId);

As far as I recall, SingleOrDefault{Async} is not a row limiting (FirstOrDefault{Async} is), and as you can see, there is no Skip/Take here.

EF Core version 5.0.1 Database provider: Microsoft.EntityFrameworkCore.SqlServer Target framework: .NET 5.0.1 Operating system: Win 10 x64

AndriySvyryd commented 3 years ago

Triage: improve exception message and mention Split query

Zero3 commented 3 years ago

Seems like this is a duplicate of https://github.com/dotnet/efcore/issues/22579.

smitpatel commented 3 years ago

Note to self: Cover this in #24706

changhuixu commented 3 years ago

@NatanBagrov May I ask, why did you say "SingleOrDefault" is not a row limiting query, but "FirstOrDefault" is? I thought opposite. Thank you.

smitpatel commented 3 years ago

FirstOrDefault gives first result from a collection of rows. Since it only returns 1, the order becomes important and it becomes row limiting operation. SingleOrDefault by definition gives the first result but also verifies that there is only 1 row at max in the result. Hence it is not row limiting operation.

changhuixu commented 3 years ago

@smitpatel thank you. Yes, I understand that FirstOrDefault gets the first one found while SingleOrDefault gets the first one but also verifies that's the only one.

From what you have described above, FirstOrDefault is different because the order becomes important. Then does it mean the order is NOT important for SingleOrDefault?

Usually, FirstOrDefault translates in SQL does TOP 1 record, and SingleOrDefault does TOP 2. If this is the case, then why one is row limiting operation while the other is not.

smitpatel commented 3 years ago

Order is not important for Single because the ideal expectation is that there is only 1 row so there is no issue of choosing correct result. FirstOrDefault translates to Top 1 so just get the first row, we don't care about rest. SingleOrDefault translates to Top 2 so that we can see if there is another row, in order to throw exception.

Row limiting operation determines restricting number of rows to a smaller set. SingleOrDefault doesn't require selection criteria to determine how to limit the rows so it doesn't throw the error.

changhuixu commented 3 years ago

In the first sentence, you said "there's no issue of choosing the correct result" for SingleOrDefault. Does it mean, ordering IS an issue for FirstOrDefault? If yes, what is it? I guess I don't fully understand the underlying SQL process in your mind. In my mind, ordering is important for both operations if we want the SQL execution to be most efficient.

Also, what I have learned long time ago is that FirstOrDefault is more efficient if we only need one result without worrying the rest. How come in EFCore5, it gives warning? Should we always do OrderBy beforehand? That seems like another operation may take time on the database side.

roji commented 3 years ago

The crucial point is that for SingleOrDefault, you get an exception for any set that contains more than one element (that's just how the operator works). So an ORDER BY in SQL isn't required: it's OK for the database to return rows in any order, since if we get more than one we throw anyway. In contrast, FirstOrDefault without ORDER BY is completely non-deterministic: if you don't specify ordering, you have no idea which row you get back out of the matching set.

Also, what I have learned long time ago is that FirstOrDefault is more efficient if we only need one result without worrying the rest. How come in EFCore5, it gives warning?

It's true that if you want only one row, FirstOrDefault allows you to fetch only rather than all of them, which is good for performance. However, you must still specify which row you want back, i.e. according to which ordering.

changhuixu commented 3 years ago

@roji Thank you. Is there a way to configure the warning? For example, in some code, it's totally ok for FirstOrDefault to return a non-deterministic record, because there are other logic in place.

roji commented 3 years ago

All of EF Core's warnings can be suppressed, see the docs. The relevant EventId here is CoreEventId.RowLimitingOperationWithoutOrderByWarning (see the original post above, where the warning is configured to throw).

changhuixu commented 3 years ago

@roji Thank you very much!

NatanBagrov commented 3 years ago

All of EF Core's warnings can be suppressed, see the docs. The relevant EventId here is CoreEventId.RowLimitingOperationWithoutOrderByWarning (see the original post above, where the warning is configured to throw).

Hi @roji, is it possible to configure these warning events per query? As @changhuixu asked, "in some code". I only see a global context configuration. Is there any option similar to AsSplitQuery, which is query-wise, but for configuring these warnings? Might be a nice feature (though not sure how many will use it)...

roji commented 3 years ago

No, there's no such way - we've not received requests for per-query warning suppression. If you assume responsibility for your queries, it's generally good enough to suppress the warning globally and then make sure on a query-by-query basis that things make sense.

NatanBagrov commented 3 years ago

No, there's no such way - we've not received requests for per-query warning suppression. If you assume responsibility for your queries, it's generally good enough to suppress the warning globally and then make sure on a query-by-query basis that things make sense.

Sure thing, I agree with you. Just asking... :)