dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.77k stars 3.18k forks source link

Query on owned entity produces overly complicated SQL #18299

Closed matteocontrini closed 4 years ago

matteocontrini commented 5 years ago

When querying an entity and filtering on an owned entity the SQL query that is produced includes a LEFT JOIN that could be avoided.

Steps to reproduce

Entites:

class Order
{
    public int Id { get; set; }
    public string Title { get; set; }
    public Address Address { get; set; }
}

class Address
{
    public string Street { get; set; }
    public string City { get; set; }
}

Model configuration:

modelBuilder.Entity<Order>().OwnsOne(x => x.Address);

The database table that is created looks like this:

immagine

A simple query like:

context.Orders.Where(x => x.Address == null).ToList();

Produces this SQL:

SELECT o."Id", o."Title", t."Id", t."Address_City", t."Address_Street"
      FROM "Orders" AS o
      LEFT JOIN (
          SELECT o0."Id", o0."Address_City", o0."Address_Street", o1."Id" AS "Id0"
          FROM "Orders" AS o0
          INNER JOIN "Orders" AS o1 ON o0."Id" = o1."Id"
          WHERE (o0."Address_Street" IS NOT NULL) OR (o0."Address_City" IS NOT NULL)
      ) AS t ON o."Id" = t."Id"
      WHERE (t."Id" IS NULL)

Which is overly complicated. The columns Address_City and Address_Street are available on the Orders table without any JOIN.

Same thing when querying a specific owned entity property:

context.Orders.Where(x => x.Address.City == "Rome").ToList();
SELECT o."Id", o."Title", t."Id", t."Address_City", t."Address_Street"
      FROM "Orders" AS o
      LEFT JOIN (
          SELECT o0."Id", o0."Address_City", o0."Address_Street", o1."Id" AS "Id0"
          FROM "Orders" AS o0
          INNER JOIN "Orders" AS o1 ON o0."Id" = o1."Id"
          WHERE (o0."Address_Street" IS NOT NULL) OR (o0."Address_City" IS NOT NULL)
      ) AS t ON o."Id" = t."Id"
      WHERE (t."Address_City" = 'Rome') AND (t."Address_City" IS NOT NULL)

Further technical details

Example project (PostgreSQL): EfCoreOwnedEntity.zip

EF Core version: 3.0.0 Database provider: Npgsql.EntityFrameworkCore.PostgreSQL 3.0.1 Target framework: .NET Core 3.0 Operating system: Windows 10 1903 IDE: e.g. Visual Studio 2019 16.3.2

AndriySvyryd commented 4 years ago

We've reclassified this as a bug, but as @smitpatel said it's still too risky to fix in a patch release.

davidames commented 4 years ago

@AndriySvyryd - what sort of timeframe are we looking at for resolution of this bug?

MorenoGentili commented 4 years ago

@AndriySvyryd We've reclassified this as a bug, but as @smitpatel said it's still too risky to fix in a patch release.

Thank you, I personally think it was a good decision. Please, also update the documentation and/or write a blog post to let more people know about this issue. Unfortunately, it looks like there are long months ahead of us before a fix is released, despite your best (and most appreciated) efforts. If you don't, some developers might not notice until they go to production and with enough data they'll watch their application slow down to a crawl. After so many comments, it seems clear now that people are wasting time because this issue was not addresses properly.

amrbadawy commented 4 years ago

I agree with @MorenoGentili that documentation must be updated and a post on devblog to notify all developers about this performance bug

ajcvickers commented 4 years ago

I'm also still looking at other potential workarounds that are not as drastic as writing raw queries. I have ideas, but it'll probably be the weekend before I get enough coding time to try them out.

matteocontrini commented 4 years ago

(@amrbadawy wrong mention 😊)

AndriySvyryd commented 4 years ago

@AndriySvyryd - what sort of timeframe are we looking at for resolution of this bug?

@davidames It will be fixed before November, but it's hard to say in which 5.0 preview it would be included.

julielerman commented 4 years ago

@ajcvickers given that you're doing it over the weekend, let me know if I can help...guinea pig or something. I'm still sick so not getting out much anyway.

julielerman commented 4 years ago

@MorenoGentili @amrbadawy are either of you able to either add an issue via the comments on the bottom of the doc's page or even make the change and submit a PR?

johanndev commented 4 years ago

My team also experienced a pretty drastic performance degradation after our DB size grew over a certain threshold. FWIW, we were able to mitigate the performance impact with the help of our DBA; we re-built the statistics on our table to help the query planner. We also added a maintenance task to regularly rebuild our statistics.

More info about statistics and its relationship with query plans in this article: https://www.red-gate.com/simple-talk/sql/performance/sql-server-statistics-basics/

MorenoGentili commented 4 years ago

@julielerman I added a comment as you suggested. However, I won't send a PR since I'm not sure what the proper wording would be in this case.

MikaelEliasson commented 4 years ago

Our team is also totally blocked from upgrading by this issue.

It looks like we will create our own fork where we rip out all the table splitting stuff that we don't use anyway and just fix the queries. Anyone done anything similar already?

smitpatel commented 4 years ago

Generated SQL after fix for query in OP

      SELECT [o].[Id], [o].[Title], [o].[Address_City], [o].[Address_Street]
      FROM [Order] AS [o]
      WHERE [o].[Address_City] = N'Rome'
grexican commented 4 years ago

I can appreciate that this will be in EFCore 5. But that leaves those of us stuck on a .NET Framework project stranded with this incredibly poor-performing bug. Is there really no way to include this in a patch release for v3? Or is there no workaround? I tried to work around it myself by using FromSql, hoping that I could tell it "relax... I'm writing the core SQL for the data" but it just wraps my SQL in the same horrible LEFT JOIN syntax.

I make heavy use of value objects and complex owned entities. Right now my query is generating 114 LEFT JOINS to itself and takes 3 minutes to run! There has to be a way around this in v3! Saying it's a v5-only fix just doesn't cut it :(

@smitpatel any thoughts? Could your latest commit be patched into v3? Even if not offically released... I'll build it myself and use a local nuget repo if I have to! I'm desperate.

julielerman commented 4 years ago

Can we make a little extension from the patch and toss it on nuget?

amrbadawy commented 4 years ago

please fix this issue in EF Core 3.x to allow us use it without a huge of changes

smitpatel commented 4 years ago

@smitpatel any thoughts? Could your latest commit be patched into v3? Even if not offically released... I'll build it myself and use a local nuget repo if I have to! I'm desperate.

I will let @ajcvickers respond to possibilities other than patching. For the patching, it is not possible to patch this. Even though there is 1 commit directly linked to this item, there has been several commits went in to lay down infrastructure to make this change. Some of them involved breaking changes. Further, it touches how EF Core materialize entity, which in itself is a core path for query and too risky for a patch to change.

julielerman commented 4 years ago

that makes sense, @smitpatel. I still vote for the temporary solution of insulating the relevant queries and either using raw sql or dapper for now. (with perf tests to compare those options if the diffs are significant enough to care in your solution.)

grexican commented 4 years ago

FWIW today I did a temporary workaround. Most of my 114 queries are value objects of Money, Percentage, Count, EmailAddress, ZipCode, etc. In other words, one-value value objects.

I was mapping them like like this:

           builder.OwnsOne(m => m.OwnerPersonalEmail).Property(e => e.Value).HasColumnName(nameof(Recruit.OwnerPersonalEmail));
            builder.OwnsOne(m => m.OwnerPreferredPhoneNumber).Property(e => e.Value).HasColumnName(nameof(Recruit.OwnerPreferredPhoneNumber));

Because they're all single-value value objects, that means they map to a single field in the DB, too. So I rewrote them using Value Converters like this:

            builder.Property(m => m.OwnerPersonalEmail).HasConversion(EmailAddressValueConverter.Instance);
            builder.Property(m => m.OwnerPreferredPhoneNumber).HasConversion(PhoneNumberValueConverter.Instance);

Property instead of OwnsOne. And HasConversion. The converters are pretty basic. Inline, they would look something like: HasConversion(phone => phone == null ? null : phone.Value, str => str == null ? null : new PhoneNumber(str))

Worth noting there's some risk in doing it this way that wasn't risky in the old way. That is, if my rules for PhoneNumber changes, or if I get some bad data in the DB, the old deserialization process from EF would still use the private setter to set the Value property, whereas now it's using the public constructor. If this presents a problem in the future, then I'll have to go the unfortunate route of using reflection to construct my object with my private parameterless constructor, and reflection to set the property value.

With that said, I'm REALLY REALLY hoping we get a patch for V3. 3 Minutes to run a query and 114 self joins... that's gotta be a patch for EF3. This saved me this time (I still have EIGHT self joins). But some objects won't be so easy for me to do.

If this doesn't get patched with V3, I might be looking at going to NHibernate (really don't want to have to rewrite all my configurations to NH!). This project isn't fully CQRS, so I don't feel like going Dapper for reads and slow, complex EF loads for writes is going to be a good solution here, nor is eager loading the entire object graph with Dapper. I need a DDD-ready, lazyloading ORM.

Hope this post helps others in similar situations.

julielerman commented 4 years ago

Hi @grexican, interesting. Thanks. There's an interesting discussion of value objects and value converters in this thread https://github.com/dotnet/efcore/issues/13947 if you haven't seen it yet. Also, maybe your model specifies this but I'm curious about 'eager loading a graph' to get VO properties. Are you storing them in separate tables? And re "lazy loading", are you lazy loading those properties? Not challenging you, just curious. There are a lot of variables to consider. :)

ajcvickers commented 4 years ago

Here's an update on where we are with this at the release level. Unfortunately:

So we're now getting this fixed in the master branch as soon as possible. This will mean:

As soon as the fix is in the daily builds, we will post here and would love people to start testing it. We need to make sure that we understand:

So then if we get this in a daily build/preview with enough testing and validation from customer cases, we will revisit the risk/overhead associated with getting this out as a fully supported release verses having people use 5.0 previews. (We have been brainstorming ways to release more than a preview, and will continue to do so, but I'm not promising at this time that this will happen.)

A note on daily builds. EF Core 3.0 was not normal in terms of the internal churn and state of the daily/preview builds. In EF Core 5.0, we are back to a normal dev flow where we keep all the tests passing on each build. This means that we expect the daily builds to be very stable. Of course, there will be bugs because we're writing a lot of new code. But the good thing about the daily builds is that we often have the ability to fix blocking bugs and get them out in a matter of days.

A note on 5.0 previews. We're planning on shipping these frequently. However, this is coordinated across .NET, and preview 2 requires more coordination than usual. This means the window for preview 2 changes has now closed, and the fix for this will likely make it into preview 3. This is another reason to try the daily builds and not wait for the preview.

absilver808 commented 4 years ago

@smitpatel @ajcvickers Sorry if this is a silly question but I noticed that this item is mentioned in #19932 which was closed and set in the 3.1.3 patch as a closed issue? Does this mean it should be fixed in 3.1.3? Thanks

ajcvickers commented 4 years ago

@absilver808 We fixed some cases, but not the majority. Read through the discussion here to understand more.

julielerman commented 4 years ago

@ajcvickers reading Smit's description in #19932 I'm trying to discern which cases are fixed. "When owned navigation is defined on base type " : does this mean when there is an explicit navigation mapping? I only tested ONE scenario that I already have in place....a simple class with one property that is an owned entity. THere is only the ownsone mapping, no navigation mappings. And the query doesnt change with 3.1.3. When I added a navigation mapping (following examples in the breaking changes doc) there was still no change. Is there an easy list of what scenarios might be covered with this little fix? (I've read the discussion and can't tell, so I"m sorry if I missed it...) Thanks

AndriySvyryd commented 4 years ago

@absilver808 @julielerman #19932 removes the INNER JOIN for owned types defined on the base entity type when the query selects the base type and all the derived types.

julielerman commented 4 years ago

@AndriySvyryd ahh when inheritance is involved... got it. Glad you were at least able to find some places where you could fix it pre-EFCore 5.

ajcvickers commented 4 years ago

We believe this issue is fixed (except maybe some edge cases) in EF Core 5.0 preview 3, which is now live on NuGet. See the announcement post for full details.

Please, please try this preview and let us know if you're still running into issues so that we can address any remaining issues in future 5.0 previews.

Note that .NET 5 is not required to use EF 5.0. You can try it with your .NET Core 3.1 applications!

cbordeman commented 4 years ago

This is fixed in the latest .net 5.0 preview.

Vake93 commented 4 years ago

Is there any chance the fix in EF 5 to be back ported into EF core 3.1? With these joins Owned entity types isn’t usable at all with EF core 3.1. And since .NET core 3.1 is a LTS version we can’t really think of upgrade until the next LTS version becomes available.

zornwal commented 4 years ago

I know this is late, but maybe consider getting this listed as a breaking change on the EF Core 3 docs?

We're having roughly the same setup as in https://github.com/dotnet/efcore/issues/18299#issuecomment-580196697, and simply calling _dataContext.Users.Find(userId); where Users own another entity and userId is a primary key, will throw an Invalid operation exception stating that the sequence contains multiple elements (which is obviously nonsense as that can't happen for primary keys, and also isn't the case in the database). Going from Find(key) finding the key to throwing an exception for a key that only exists once in the database seems like a textbook definition of a breaking change.

smitpatel commented 4 years ago

@zornwal - If you are hitting an exception then your issue is not related to this one. There is no error thrown if you are hitting this issue.

smitpatel commented 4 years ago

@Vake93 - This cannot be backported to 3.1 release.

zornwal commented 4 years ago

@smitpatel Alright we managed to find out what was going on, and it seems at least tangentially related to the issue. We had a base class with a collection of owned entities, from which two other classes inherited. At some point, the collection was moved down into one of the inheriting entities instead, without removing the entries referencing the now collection-less other class. This led to querying the dbset of the base class failing with the mentioned exception when the entity in question was the collection-less one, but had entries remaining in the database. Don't know if this is intended, but it seems pretty obscure anyway. Might be worth checking if issue persists on 5.0 of it isn't intended.

smitpatel commented 4 years ago

@zornwal - Restating, if you are running into this issue then you will not get any exception. If you are seeing an exception then your issue is not same as this. Please file a new issue with detailed repro.

MoazAlkharfan commented 4 years ago

FullText search is broken because it's being run on the joined data and it's throwing an exception Cannot use a CONTAINS or FREETEXT predicate on column ......

[t0].[Property_14] and [t0].[Property_13] should be [j].[Property_14] and [j].[Property_13] for the problem to be fixed.

SELECT [j].[Id]
FROM [TABLE_A] AS [j]
LEFT JOIN (
    SELECT [j0].[Id], [j0].[Property_1], [j0].[Property_2], [j0].[Property_3], [j0].[Property_4], [j0].[Property_5]
    FROM [TABLE_A] AS [j0]
    WHERE [j0].[Property_1] IS NOT NULL
) AS [t] ON [j].[Id] = [t].[Id]
LEFT JOIN (
    SELECT [j1].[Id], [j1].[Property_6], [j1].[Property_7], [j1].[Property_8], [j1].[Property_9], [j1].[Property_10], [j1].[Property_11], [j1].[Property_12], [j1].[Property_13], [j1].[Property_14], [j1].[Property_15], [j1].[Property_16]
    FROM [TABLE_A] AS [j1]
    WHERE [j1].[Property_11] IS NOT NULL
) AS [t0] ON [j].[Id] = [t0].[Id]
WHERE (([j].[Property_22] <> CAST(1 AS bit)) AND (DATEADD(day, CAST(10.0E0 AS int), [t].[Property_1]) > CONVERT(date, GETDATE()))) AND (((([j].[Property_22] <> CAST(1 AS bit)) AND ([j].[State] = 3)) AND (FREETEXT([t0].[Property_14], 'testsearch') OR FREETEXT([t0].[Property_13], 'testsearch'))) AND (DATEADD(day, CAST(10.0E0 AS int), [t].[Property_1]) > CONVERT(date, GETDATE())))
AndriySvyryd commented 4 years ago

@MoazAlkharfan Please open a new issue with a small repro project

darkflame0 commented 4 years ago

@smitpatel @ajcvickers When I use FromSqlRaw, it still generates complex and inefficient sql

db.LiveHistory.FromSqlRaw(@"select * from ""LiveHistory""").Select(a => a.Data.RoomId).FirstOrDefault();
      SELECT t."RoomId"
      FROM (
          select * from "LiveHistory"
      ) AS l
      LEFT JOIN (
          SELECT l0."Id", l0."RoomId"
          FROM "LiveHistory" AS l0
          INNER JOIN "LiveHistory" AS l1 ON l0."Id" = l1."Id"
      ) AS t ON l."Id" = t."Id"
      LIMIT 1
db.LiveHistory.Select(a => a.Data.RoomId).FirstOrDefault();
      SELECT l."RoomId"
      FROM "LiveHistory" AS l
      LIMIT 1
smitpatel commented 4 years ago

@darkflame0 - When you use FromSql* API on an entity type which has owned entities, we cannot simplify the SQL easily. EF Core does not parse the SQL to look into it what you are selecting from. The owner entity can be coming from a stored procedure or a view or a custom projection. In neither of those case we can assume that owned entity will also come from there only. Even with an example like yours, instead of select * if you specified all list of columns and did not select columns for owned entities, it wouldn't work without a join. So if we tried to pull owned entity columns, it will be invalid SQL error. Hence on the safer side we generate a join. We are tracking in #21888 to improve experience around this.

taspeotis commented 3 years ago

EF Core does not parse the SQL to look into it what you are selecting from. The owner entity can be coming from a stored procedure ... Hence on the safer side we generate a join

This just ... breaks FromSqlRaw though? You can't put EXECUTE MyStoredProcedure into a subquery.

SELECT * FROM ( EXECUTE ... ) LEFT JOIN ...

This is a syntax error in SQL Server. There are similar limitations on other relational databases (for example FromSqlRaw("UPDATE Table SET Counter = Counter + 1 RETURNING *") in Postgres works just fine until you have owned entity).

The documentation is full of examples of using FromSqlRaw to call stored procedures.

The documentation is also wrong:

SQL Server doesn't allow composing over stored procedure calls, so any attempt to apply additional query operators to such a call will result in invalid SQL. Use AsEnumerable or AsAsyncEnumerable method right after FromSqlRaw or FromSqlInterpolated methods to make sure that EF Core doesn't try to compose over a stored procedure.

You can't do anything to prevent EF Core from composing its LEFT JOINs for owned entities, it just goes straight to breaking the query.

The owner entity can be coming from a stored procedure or a view or a custom projection. In neither of those case we can assume that owned entity will also come from there only.

The documentation already passes the obligation on to whoever is calling FromSqlRaw to get all the data:

There are a few limitations to be aware of when using raw SQL queries:

  • The SQL query must return data for all properties of the entity type.
pantonis commented 3 years ago

I am using EF Core 5.0.5 with Sql Server and I can still see this issue. Please check my code below:

public class User
{
    [Key]
    [DatabaseGenerated(DatabaseGeneratedOption.Identity)]
    public long Id { get; set; }

    public string Name { get; private set; }

    private readonly List<RefreshToken> refreshTokens = new List<RefreshToken>();
    public virtual IReadOnlyCollection<RefreshToken> RefreshTokens => refreshTokens;

    public User()
    {

    }

    private User(string name)
    {
        Name = name ?? throw new ArgumentNullException(nameof(name));
    }

    public static User CreateUser(string name)
    {
        return new User(name);
    }
}

public class RefreshToken : ValueObject
{
    public string Token { get; }

    public DateTime ExpirationTime { get; }

    private RefreshToken(string token, DateTime expirationTime)
    {
        Token = token;
        ExpirationTime = expirationTime;
    }

    public static Result<RefreshToken> Create(string token, int expirationMinutes)
    {
        token = (token ?? string.Empty).Trim();

        if (string.IsNullOrEmpty(token))
            return Result.Failure<RefreshToken>("Refresh Token should not be empty.");

        if (expirationMinutes <= 0)
            return Result.Failure<RefreshToken>("Invalid Expiration in minutes value.");

        DateTime expirationDate = DateTime.UtcNow.AddMinutes(expirationMinutes);

        return Result.Success(new RefreshToken(token, expirationDate));
    }

    protected override IEnumerable<object> GetEqualityComponents()
    {
        yield return Token;
        yield return ExpirationTime;
    }
}

public class MyDbContext : DbContext
{
    public MyDbContext(DbContextOptions options)
        : base(options)
    {

    }

    public virtual DbSet<User> User { get; set; }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        base.OnModelCreating(modelBuilder);

        modelBuilder.Entity<User>(entity =>
        {
            entity.OwnsMany(p => p.RefreshTokens, a =>
            {
                a.ToTable("RefreshToken");
                a.WithOwner().HasForeignKey("UserId").Metadata.PrincipalToDependent.SetIsEagerLoaded(false);
                a.Property<long>("Id");
                a.HasKey("Id");

                a.Property<string>("Token").IsRequired();
                a.Property<DateTime>("ExpirationTime").IsRequired();
            });
        });
    }
}

public class UserService : IUserService
{
    private readonly MyDbContext dbContext;

    public UserService(MyDbContext dbContext)
    {
        this.dbContext = dbContext;
    }

    public void GetUsers()
    {
        var user = (from dbUser in dbContext.User
                             .Include(x => x.RefreshTokens.Where(token => token.ExpirationTime < DateTime.UtcNow))
                    where dbUser.Id == 1
                    select dbUser).FirstOrDefault();
    }
}

and the generated SQL query is the following:

SELECT [t].[Id], [t].[Name], [t0].[Id], [t0].[ExpirationTime], [t0].[Token], [t0].[UserId]
FROM (
  SELECT TOP(1) [u].[Id], [u].[Name]
  FROM [User] AS [u]
  WHERE [u].[Id] = CAST(1 AS bigint)
) AS [t]
LEFT JOIN (
  SELECT [r].[Id], [r].[ExpirationTime], [r].[Token], [r].[UserId]
  FROM [RefreshToken] AS [r]
  WHERE [r].[ExpirationTime] < GETUTCDATE()
) AS [t0] ON [t].[Id] = [t0].[UserId]
ORDER BY [t].[Id], [t0].[Id]
mikke49 commented 3 years ago

We're using FromSqlRaw to access temporal tables, finding older version of entity. The entity also has owned entities. The raw SQL selects * and uses the FOR SYSTEM_TIME (something), which by itself works fine. The problem is the JOIN to fetch the owned entities, resulting in these properties having current values while all other properties are correct older version.

pantonis commented 3 years ago

@mikke49 I am not using FromSqlRaw.