dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.72k stars 3.17k forks source link

Sub-optimal query generation against filtered base-typed DbSets in TPC schemas #30774

Open John-Paul-R opened 1 year ago

John-Paul-R commented 1 year ago

Edit: Since originally writing this, it has come to my attention that many (most?) database implementations will optimize away the "sub-optimal" EF-generated query called out in this issue, such that there is zero effective performance degredation. The databases I tested with, along with the results are laid out in the comments below. TL;DR: This is a complete non-issue for Postgres and MySQL, but appears to negatively impact the query plan on SQLite.


Suppose one has the following schema:

public abstract class AbstractSkill
{
    [DatabaseGenerated(DatabaseGeneratedOption.Identity)]
    public Guid Id { get; set; }

    public string Name { get; set; } = null!;
}

public class MartialSkill : AbstractSkill
{
    public bool HasStrike { get; set; }
}

public class MagicSkill : AbstractSkill
{
    public string RunicName { get; set; } = null!;
}

public class DivineSkill : AbstractSkill
{
    public int RequiredDivinity { get; set; }
}

and some related dbsets:

public DbSet<AbstractSkill> Skills { get; set; }
// ...potentially other dbsets...

If one wanted to get all "martial" or "magic" skills, they might write the following query:

var martialOrMagicSkills = db.Skills
    .Where(s => s is MartialSkill || s is MagicSkill)
    .ToList();

and that would work!

The generated query, however, is sub-optimal. It looks like this:

 SELECT "t"."Id", "t"."Name", "t"."RequiredDivinity", "t"."RunicName", "t"."HasStrike", "t"."Discriminator"
 FROM (
     SELECT "d"."Id", "d"."Name", "d"."RequiredDivinity", NULL AS "RunicName", NULL AS "HasStrike", 'DivineSkill' AS "Discriminator"
     FROM "DivineSkills" AS "d"
     UNION ALL
     SELECT "m"."Id", "m"."Name", NULL AS "RequiredDivinity", "m"."RunicName", NULL AS "HasStrike", 'MagicSkill' AS "Discriminator"
     FROM "MagicSkills" AS "m"
     UNION ALL
     SELECT "m0"."Id", "m0"."Name", NULL AS "RequiredDivinity", NULL AS "RunicName", "m0"."HasStrike", 'MartialSkill' AS "Discriminator"
     FROM "MartialSkills" AS "m0"
 ) AS "t"
 WHERE "t"."Discriminator" IN ('MartialSkill', 'MagicSkill')

Notably, the generated SQL UNION ALLs the results from all available derived tables, even though it is known at query-time that no results can be found in the DivineSkills table, based on the query filter.

The solution here is to not include the query against the DivineSkills table in the generated SQL, since it's corresponding model type has been effectively filtered out.


Potentially worth noting that when filtering on a single derived type (instead of 2, as in this example), the generated SQL correctly targets only the specified table. For example, this query:

var magicSkills = db.Skills
    .Where(s => s is MagicSkill)
    .ToList();

correctly yields:

SELECT "m"."Id", "m"."Name", NULL AS "RequiredDivinity", "m"."RunicName", NULL AS "HasStrike", 'MagicSkill' AS "Discriminator"
FROM "MagicSkills" AS "m"

Resources

John-Paul-R commented 1 year ago

Oh dear. You'd think I'd check this first.

Running some tests with Postgres 14, I've found that the query planner is able to avoid targeting the DivineSkill table entirely, presumably due to the "Discriminator" filter, so this may entirely be a non-issue. Will check with another db and close this if so.


The results I've found for Postgres **Schema (PostgreSQL v14)** CREATE TABLE martial_skill ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name TEXT NOT NULL, has_strike BOOLEAN NOT NULL ); CREATE TABLE magic_skill ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name TEXT NOT NULL, runic_name TEXT NOT NULL ); CREATE TABLE divine_skill ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name TEXT NOT NULL, required_divinity INT NOT NULL ); INSERT INTO martial_skill (name, has_strike) VALUES ('Karate', true); INSERT INTO martial_skill (name, has_strike) VALUES ('Boxing', true); INSERT INTO magic_skill (name, runic_name) VALUES ('Wizardry', 'Zarathustra'); INSERT INTO magic_skill (name, runic_name) VALUES ('Necromancy', 'Erebos'); INSERT INTO divine_skill (name, required_divinity) VALUES ('Blessing', 100); INSERT INTO divine_skill (name, required_divinity) VALUES ('Healing', 50); --- **Query #1** EXPLAIN ANALYZE SELECT t.id, t.name, t.required_divinity, t.runic_name, t.has_strike, t.discriminator FROM ( SELECT d.id, d.name, d.required_divinity, NULL::text AS runic_name, NULL::boolean AS has_strike, 'DivineSkill' AS discriminator FROM divine_skill AS d UNION ALL SELECT m.id, m.name, NULL AS required_divinity, m.runic_name, NULL::boolean AS has_strike, 'MagicSkill' AS discriminator FROM magic_skill AS m UNION ALL SELECT m0.id, m0.name, NULL AS required_divinity, NULL::text AS runic_name, m0.has_strike, 'MartialSkill' AS discriminator FROM martial_skill AS m0 ) AS t WHERE t.discriminator IN ('MartialSkill', 'MagicSkill'); | QUERY PLAN | | -------------------------------------------------------------------------------------------------------------------- | | Append (cost=0.00..47.15 rows=1810 width=117) (actual time=0.006..0.010 rows=4 loops=1) | | -> Seq Scan on magic_skill m (cost=0.00..17.50 rows=750 width=117) (actual time=0.005..0.005 rows=2 loops=1) | | -> Seq Scan on martial_skill m0 (cost=0.00..20.60 rows=1060 width=117) (actual time=0.002..0.002 rows=2 loops=1) | | Planning Time: 0.566 ms | | Execution Time: 0.038 ms | --- [View on DB Fiddle](https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/8382)
John-Paul-R commented 1 year ago

Less familiar with MySQL, but the fact that this is called out as an Impossible WHERE (with 0 cost) makes me think the query optimizer is more than capable of doing what it needs to here.

-> Table scan on t (cost=0.64..2.55 rows=4) (actual time=0.055..0.055 rows=4 loops=1) -> Union materialize (cost=1.94..3.85 rows=4) (actual time=0.053..0.053 rows=4 loops=1) -> Zero rows (Impossible WHERE) (cost=0.00..0.00 rows=0) (actual time=0.000..0.000 rows=0 loops=1) -> Table scan on m (cost=0.45 rows=2) (actual time=0.013..0.017 rows=2 loops=1) -> Table scan on m0 (cost=0.45 rows=2) (actual time=0.013..0.014 rows=2 loops=1)

Given that, closing this issue as, well, a complete non-issue!

John-Paul-R commented 1 year ago

I'll call out that SqLite does not appear to be able to optimize this away... if I'm reading this right:

ddl ```sql CREATE TABLE martial_skill ( id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT NOT NULL, has_strike INTEGER NOT NULL ); CREATE TABLE magic_skill ( id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT NOT NULL, runic_name TEXT NOT NULL ); CREATE TABLE divine_skill ( id INTEGER PRIMARY KEY AUTOINCREMENT, name TEXT NOT NULL, required_divinity INTEGER NOT NULL ); INSERT INTO martial_skill (name, has_strike) VALUES ('Karate', 1); INSERT INTO martial_skill (name, has_strike) VALUES ('Boxing', 1); INSERT INTO magic_skill (name, runic_name) VALUES ('Wizardry', 'Zarathustra'); INSERT INTO magic_skill (name, runic_name) VALUES ('Necromancy', 'Erebos'); INSERT INTO divine_skill (name, required_divinity) VALUES ('Blessing', 100); INSERT INTO divine_skill (name, required_divinity) VALUES ('Healing', 50); ```
EXPLAIN QUERY PLAN
SELECT t.id, t.name, t.required_divinity, t.runic_name, t.has_strike, t.discriminator
FROM (
    SELECT d.id, d.name, d.required_divinity, NULL AS runic_name, NULL AS has_strike, 'DivineSkill' AS discriminator
    FROM divine_skill AS d
    UNION ALL
    SELECT m.id, m.name, NULL AS required_divinity, m.runic_name, NULL AS has_strike, 'MagicSkill' AS discriminator
    FROM magic_skill AS m
    UNION ALL
    SELECT m0.id, m0.name, NULL AS required_divinity, NULL AS runic_name, m0.has_strike, 'MartialSkill' AS discriminator
    FROM martial_skill AS m0
) AS t
WHERE t.discriminator IN ('MartialSkill', 'MagicSkill');
id      parent  notused detail
"2" "0" "0" "CO-ROUTINE t"
"3" "2" "0" "COMPOUND QUERY"
"4" "3" "0" "LEFT-MOST SUBQUERY"
"6" "4" "0" "SCAN d"
"16"    "3" "0" "UNION ALL"
"18"    "16"    "0" "SCAN m"
"28"    "3" "0" "UNION ALL"
"30"    "28"    "0" "SCAN m0"
"41"    "0" "0" "SCAN t"

Because of this, I'll reopen, because maybe this is something that ought be handled at the EF Core level?

If a maintainer wants to close, no gripes from me.

roji commented 1 year ago

Just to point out that it's possible to rewrite this query by perform the UNION ALL manually against the desired DbSets. So instead of:

var martialOrMagicSkills = db.Skills
    .Where(s => s is MartialSkill || s is MagicSkill)
    .ToList();

... one can write:

var martialOrMagicSkills = db.MartielSkills
    .Concat(db.MagicSkils)
    .ToList();

I do agree EF could do better here, though it's not likely to be very high-value.

John-Paul-R commented 1 year ago

@roji Hm, I may be doing something wrong, but for me, the following:

var martialOrMagicSkills2 = db.MartialSkills
    .Concat(db.MagicSkills)
    .ToList();

yields the following compiler error:

efcore-repros/ExampleProject/Program.cs(63,41): error CS1929: 'DbSet' does not contain a definition for 'Concat' and the best extension method overload 'ParallelEnumerable.Concat(ParallelQuery, IEnumerable)' requires a receiver of type 'System.Linq.ParallelQuery' [efcore-repros/ExampleProject/ExampleProject.csproj]

Specifying the base type as the generic type argument in Concat allows the code to compile:

var martialOrMagicSkills2 = db.MartialSkills
    .Concat<AbstractSkill>(db.MagicSkills)
    .ToList();

however this leads to a runtime exception when translating the query:

System.ArgumentException: Expression of type 'System.Linq.IQueryable`1[ExampleProject.MartialSkill]' cannot be used for parameter of type 'System.Linq.IQueryable`1[ExampleProject.MagicSkill]' of method 'System.Linq.IQueryable`1[ExampleProject.MagicSkill] Concat[MagicSkill](System.Linq.IQueryable`1[ExampleProject.MagicSkill], System.Collections.Generic.IEnumerable`1[ExampleProject.MagicSkill])' (Parameter 'arg0')
   at System.Dynamic.Utils.ExpressionUtils.ValidateOneArgument(MethodBase method, ExpressionType nodeKind, Expression arguments, ParameterInfo pi, String methodParamName, String argumentParamName, Int32 index)
   at System.Linq.Expressions.Expression.Call(MethodInfo method, Expression arg0, Expression arg1)
   at Microsoft.EntityFrameworkCore.Query.Internal.NavigationExpandingExpressionVisitor.ProcessSetOperation(NavigationExpansionExpression outerSource, MethodInfo genericMethod, NavigationExpansionExpression innerSource)
   at Microsoft.EntityFrameworkCore.Query.Internal.NavigationExpandingExpressionVisitor.VisitMethodCall(MethodCallExpression methodCallExpression)
   at Microsoft.EntityFrameworkCore.Query.Internal.NavigationExpandingExpressionVisitor.Expand(Expression query)
   at Microsoft.EntityFrameworkCore.Query.QueryTranslationPreprocessor.Process(Expression query)
   at Microsoft.EntityFrameworkCore.Query.QueryCompilationContext.CreateQueryExecutor[TResult](Expression query)
   at Microsoft.EntityFrameworkCore.Storage.Database.CompileQuery[TResult](Expression query, Boolean async)
   at Microsoft.EntityFrameworkCore.Query.Internal.QueryCompiler.CompileQueryCore[TResult](IDatabase database, Expression query, IModel model, Boolean async)
   at Microsoft.EntityFrameworkCore.Query.Internal.QueryCompiler.<>c__DisplayClass9_0`1.<Execute>b__0()
   at Microsoft.EntityFrameworkCore.Query.Internal.CompiledQueryCache.GetOrAddQuery[TResult](Object cacheKey, Func`1 compiler)
   at Microsoft.EntityFrameworkCore.Query.Internal.QueryCompiler.Execute[TResult](Expression query)
   at Microsoft.EntityFrameworkCore.Query.Internal.EntityQueryProvider.Execute[TResult](Expression expression)
   at Microsoft.EntityFrameworkCore.Query.Internal.EntityQueryable`1.GetEnumerator()
   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
   at ExampleProject.Program.Main(String[] args) in efcore-repros/ExampleProject/Program.cs:line 63

Edit: I'll note, filtering multiple base-typed dbsets by derived type with the is operator, then Concat-ing those together does work:

var martialOrMagicSkills3 = db.Skills.Where(s => s is MartialSkill)
    .Concat(db.Skills.Where(s => s is MagicSkill))
    .ToList();
roji commented 1 year ago

I may be doing something wrong, but for me, the following yields the following compiler error [...]

You're referencing the wrong Concat (ParallelEnumerable.Concat instead of Queryable.Concat)

John-Paul-R commented 1 year ago

Note, I believe that compiler error is specifying an arbitrary (perhaps "best-found") extension method candidate, since there is no Queryable.Concat<T1, T2>, just a Queryable.Concat<T>.

To demonstrate, explicitly invoking Queryable.Concat yields another compiler error:

var martialOrMagicSkills2 = Queryable.Concat(
        db.MartialSkills,
        db.MagicSkills)
    .ToList();

error:

efcore-repros/ExampleProject/Program.cs(75,51): error CS0411: The type arguments for method 'Queryable.Concat(IQueryable, IEnumerable)' cannot be inferred from the usage. Try specifying the type arguments explicitly. [efcore-repros/ExampleProject/ExampleProject.csproj]


And, for this form as well, specifying the AbstractSkill base type as the single generic type argument (i.e. Queryable.Concat<AbstractSkill>(q1, q2)) yields that query translation error.