Add extension methods to turn relational nulls on/of for an individual LINQ query

GertArnold commented 9 months ago

In EF6, using database null semantics could easily be switched on or off per context instance. In EF-core, using database semantics has become part of the model configuration (UseRelationalNulls()). If we want to turn this behavior on/off we have to add code to the context class, for instance adding a boolean in its constructor and calling UseRelationalNulls(booleanMember) in OnConfiguring. It's quite a hassle to do this esp. when using dependency injection.

I would prefer having extension methods AsRelationalNulls / AsNoRelationalNulls (in the line of AsTracking, AsNoTracking, AsSplitQuery, etc. in EntityFrameworkQueryableExtensions) that can be used to add this behavior to individual LINQ queries.

Just as AsTracking, AsNoTracking, it would depend on the initial context configuration which of the two methods is a no-op.

roji commented 9 months ago

In EF6, using database null semantics could easily be switched on or off per context instance.

FWIW the same is true for EF Core, as you yourself wrote above (i.e. the setting can be configured in OnConfiguring).

Apart from that, we've seem very little users actually using relational null semantics, so I don't think we'd be prioritizing something like this any time soon. But I agree that a per-query operator could be helpful in some situations.

GertArnold commented 9 months ago

FWIW the same is true for EF Core

Sure, but it's not easy to use this in combination with dependency injection because the dependency container can only return pre-configured instances using one-time configuration in application startup. Once the instance is created we can no longer change the behavior as we could in EF6.

We solved it for now by registering a context subclass, configured for db null semantics, and injecting that where necessary. It works, but yeah, a hassle for just three queries where we need these semantics.

roji commented 9 months ago

@GertArnold right.

Out of curiosity, can you share the type problem that's making you use relational semantics?

GertArnold commented 9 months ago

Here's a repro (to be run in Linqpad 8 with EFC 8) that illustrates the reason for preferring database null semantics.

void Main()
{
    newDb();
    using (var db = getContext(false))
    {
        db.Set<Test>().Where(t => t.ParentTest.TypeId == t.TypeId).Load();
    }
    using (var db = getContext(true))
    {
        db.Set<Test>().Where(t => t.ParentTest.TypeId == t.TypeId).Load();
    }

}

void newDb()
{
    using var db = getContext(false);
    db.Database.EnsureDeleted();
    db.Database.EnsureCreated();
}

MyContext getContext(bool useRelationalNulls)
{
    var connectionString = @$"Server=(localDB)\MSSQLLocalDB;database=testDbNulls;Integrated Security=true;MultipleActiveResultSets=true;Encrypt=true;TrustServerCertificate=true;Application Name=Linqpad";
    return new MyContext(connectionString, useRelationalNulls);
}

class Test
{
    public int ID { get; set; }
    public string Name { get; set; }
    public DateTime Date { get; set; }
    public int? TypeId { get; set; }

    public int? ParentTestId { get; set; }
    public Test ParentTest { get; set; }
}

class MyContext : DbContext
{
    private readonly string _connString;
    private readonly bool _useRelationalNulls;

    public MyContext(string connectionString, bool useRelationalNulls) : base()
    {
        _connString = connectionString;
        _useRelationalNulls = useRelationalNulls;
    }

    protected override void OnModelCreating(ModelBuilder mb)
    {
        mb.Entity<Test>()
            .HasOne(e => e.ParentTest)
            .WithMany()
            .HasForeignKey(e => e.ParentTestId)
            .OnDelete(DeleteBehavior.ClientSetNull);
    }

    protected override void ConfigureConventions(ModelConfigurationBuilder configurationBuilder)
    {
        // https://learn.microsoft.com/en-us/ef/core/modeling/bulk-configuration#pre-convention-configuration
    }

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
    {
        optionsBuilder.UseSqlServer(_connString, o => o.UseRelationalNulls(_useRelationalNulls))
            .LogTo(s => s.Dump(), LogLevel.Information);
    }
}

The generated queries:

Without relational nulls:

      SELECT [t].[ID], [t].[Date], [t].[Name], [t].[ParentTestId], [t].[TypeId]
      FROM [Test] AS [t]
      LEFT JOIN [Test] AS [t0] ON [t].[ParentTestId] = [t0].[ID]
      WHERE [t0].[TypeId] = [t].[TypeId] OR ([t0].[TypeId] IS NULL AND [t].[TypeId] IS NULL)

With relational nulls:

      SELECT [t].[ID], [t].[Date], [t].[Name], [t].[ParentTestId], [t].[TypeId]
      FROM [Test] AS [t]
      LEFT JOIN [Test] AS [t0] ON [t].[ParentTestId] = [t0].[ID]
      WHERE [t0].[TypeId] = [t].[TypeId]

The latter query doesn't return data when one or both of the TypeIds are null (because = null is undetermined in SQL). We have a case where this is intended and using db null semantics yields a simple query plan. Of course we could add null checks in the LINQ query, but that clutters the SQL query (and the plan) just because of technicalities.

roji commented 9 months ago

Thanks, makes sense.

And just to be sure, both TypeId columns actually can contain null - you just don't want to return rows where both are null, right? Am just asking because sometimes users model their databases incorrectly (e.g. by making a column nullable when it shouldn't be) and then try to compensate in queries etc.

GertArnold commented 9 months ago

@roji

both TypeId columns actually can contain null

Correct!

you just don't want to return rows where both are null, right?

Also correct!

ajcvickers commented 9 months ago

@GertArnold What happens if you write that query in LINQ? That is, the query that explicitly doesn't return a value when both are null? Do we translate it?

roji commented 9 months ago

@ajcvickers we do, but that adds extra SQL to compensate (i.e. WHERE a IS NOT NULL AND b IS NOT NULL), which I think @GertArnold wants to avoid (see https://github.com/dotnet/efcore/issues/32453#issuecomment-1832082181).

At least in theory, this may be something we could improve... In other words, when the user writes Where(b => b.X == b.Y && b.X != null && b.Y != null, we could simply generate WHERE x = y in SQL.

ajcvickers commented 9 months ago

Exactly!

roji commented 9 months ago

Yeah, that's a good point - I didn't think of it until you asked... Opened #32464 to track.

GertArnold commented 9 months ago

For the record, without relational nulls, I (and others) would add only one null check, because that would suffice. The proposed improvement should keep that in mind as well.

But, interestingly,

db.Set<Test>().Where(t => t.TypeId.HasValue && t.ParentTest.TypeId == t.TypeId).Load();

...translates into:

      SELECT [t].[ID], [t].[Date], [t].[Name], [t].[ParentTestId], [t].[TypeId]
      FROM [Test] AS [t]
      LEFT JOIN [Test] AS [t0] ON [t].[ParentTestId] = [t0].[ID]
      WHERE [t].[TypeId] IS NOT NULL AND [t0].[TypeId] = [t].[TypeId]

Here, despite relational null not being configured, the redundant (and in my case undesired) part OR ([t0].[TypeId] IS NULL AND [t].[TypeId] IS NULL) is already gone. Apparently EF notices that the manually added null check makes this part contradictory. So this already alleviates the issue a bit: now only the part WHERE [t].[TypeId] IS NOT NULL is redundant as far as SQL is concerned.

roji commented 9 months ago

@GertArnold yeah, we already have some optimization smarts in this area; they're definitely not all-encompassing, and notably also depend on the ordering (e.g. if you put the HasValue check after the equality, you'll probably see some considerably uglier SQL.

dotnet / efcore

Add extension methods to turn relational nulls on/of for an individual LINQ query #32453