dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.75k stars 3.18k forks source link

TPC does not work "out of the box" with Sqlite #31752

Closed sjb-sjb closed 1 year ago

sjb-sjb commented 1 year ago

As discussed on the inheritance page, store-generated integer primary keys cannot be used with Table Per Concrete Type in Sqlite due to the fact that Sqlite lacks named sequences.

On the other hand, if one uses Guids then the problem described in #13575 occurs. Briefly, EF will sometimes track an entity in the Added state even if the primary key was already set. This happens, for example, when setting a navigation to a stored principle entity from a tracked dependent entity; even though the principle entity already has a primary key defined, it is unconditionally tracked in the Added state. This results in errors during SaveChanges. Unfortunately the workaround described in #13575 does not work for Guids; there does not seem to be a direct workaround.

While each of these two points is understandable on its own, the net effect is that TPC is very difficult to use with Sqlite. I would respectfully submit that something be done to make TPC more useable with Sqlite.

ajcvickers commented 1 year ago

@sjb-sjb What, specifically, are you suggesting?

sjb-sjb commented 1 year ago

Well... I think there are several possible approaches.

  1. One approach outlined in #13575 would be to change the way guid keys are assigned in Attach, so that it would either be in a shadow property or otherwise not assigned to the CLR property at the time Tracked is fired. This would re-enable the workaround described in #13575. On the other hand it would also be a breaking change and conceptually it is a bit weird perhaps to shadow something whose value is not going to change when it comes out of shadow and into the CLR key field.

  2. Another approach also referenced in #13575 would be to use Sqlite-generated (store generated) guid keys. While this may be possible to do today, it does not seem to be something that is ready "out of the box". So this approach would mean some code provided in EF to enable it more easily. The workaround in #13575 would then work with these server-generated keys.

  3. A third approach would be to solve the original problem of EF sometimes putting entities into the Added state even though they already have primary keys attached. I have not thought through the details but presumably this is what the AddOrAttach method proposed in #13575 is meant to accomplish. Hopefully this would mean the workaround is no longer needed.

  4. Another approach would be creating code in EF to address the problem of Sqlite not providing named autoincrement sequences that can be used to set the primary key. One could have a table containing the latest sequence number and increment this during ADD operations. Efficiency would have to be looked at. Given that sqlite is a single-user system (i.e. the file is locked) it might be possible for EF to keep the latest sequence number in memory, increment it client-side and add it to the entities just before sending them to the database. The updated sequence number would also have to be written but at least this way one would avoid alternating reads and writes on the sequence table. Instead there would just be a lot of writes to the table. Technically then these would be client-generated keys, but they would be integers that look to the EF user exactly as if they were store-generated / autoincremented keys. This workaround could be applied by EF whenever the model configuration calls for store-generated keys with integral type and the database provider is Sqlite.

On balance, and taking it with a grain of salt given that I am not familiar with the internals of either EF or Sqlite, my suggestion would be to investigate (4) and if that doesn't work then look at (2). If (4) works, it would address the core underlying problem and would be the most natural from a user perspective -- i.e. users would effectively have server-generated keys on the Sqlite platform as they do on other platforms.

Approach (1) would require a conceptual change that would also be breaking, although perhaps not a big/bad breaking change. I think (2) is reasonable although the counterargument would be that the point of guids is that the client can assign them, so it feels a bit funny to generate them on the server. I'm all for (3) but it would require users to understand the nuances of #13575 and adopt the new AddOrAttach user interface. Personally I'm a bit fuzzy on how it would address the Added state when setting a principle entity relationship -- since the user is not invoking Attach in that example they do not have a way to switch their call to AddOrAttach.

Whatever approach is used, the end goal would be for people to be able to switch their sqlite project from TPH to TPC with only minimal effort.

roji commented 1 year ago

@sjb-sjb just noting that a lot of the above really is orthogonal to TPC and/or Sqlite. That is, the main issue you're describing seems to be about how to deal with change tracking and non-temporary keys being generated client-side. Client-generated keys don't seem to be a problem for most users - you don't have to depend on the key to know the state of things - but in any case this has nothing really to do with TPC on Sqlite. For example, many people use client-side GUID generation on SQL Server (without TPC), things work in the same way in that scenario and that doesn't seem to be blocking for the vast majority of them.

In other words, I'd separate between the general question of "how do I work with change tracking and client-generated keys" (something that most users don't seem to have a problem with), and other questions specifically about managing keys in SQLite (e.g. making server-generated GUIDs easier to configure, option (2) above).

Option (4) seems to amount to basically EF creating a custom named sequence mechanism ("One could have a table containing the latest sequence number and increment this during ADD operations"); we generally don't build this kind of workaround where a database lacks support for a feature.

sjb-sjb commented 1 year ago

Hi @roji, well, I guess you're saying that on the one hand it's not EF's mandate to fix or workaround database deficiencies, and on the other hand #13575 did not gain traction with a lot of people, relative to other issues. Both understandable points. All I can say is that as a user I was pretty surprised that something like primary keys and TPC did not work out of the box, it was a frustrating experience. I ended up resolving the absence of this feature by working to defeat other features -- doing my own TrackGraph instead of using Attach, and also throwing an exception if there is an attempt to assign a reference to an entity that is not already attached to the same context. So it felt like I was having to defeat the original design intentions of EF.

ajcvickers commented 1 year ago

@sjb-sjb Would you have been prevented from going down this path if you knew that SQLite doesn't support the features you need? How did you choose to target SQLite?

roji commented 1 year ago

@sjb-sjb don't get me wrong, I understand the pain. Our general recommendation for SQLite+TPC is "use GUIDs", which most people seem to be really OK with - it's very easy to set up and just works. This is a reason why we haven't even considered doing something like a "named sequence workaround" for SQLite - people seem to be happy enough with client-generated GUIDs, just like they're happy doing that on SQL Server for various other reasons.

It may make sense for SQLitePCLRaw (which Microsoft.Data.Sqlite uses) to bundle sqlite3-uuid (or similar) - in fact maybe it even already does so (@bricelam?). That would probably make all this easier.

sjb-sjb commented 1 year ago

@roji, @ajcvickers I guess I missed the memo :-). No, when I read the inheritance page it was easy to conclude that I needed to use guids, but I didn’t understand the client side vs server side implications. I didn’t realize that Attach would assign the guid’s for me and that this has different implications for the Tracked event compared to server-side key generation.

Look at the switch from autogenerated integers to guids: you switch the type of key in your entity, in both cases you pass the entity into Attach without a key, you set your entity properties and call SaveChanges… the key takes care of itself. Naively it seems to be pretty much the same. When the workaround for 13575 broke, though, that’s when the hours of digging and figuring-out came in.

The underlying piece that seems to be missing is a systematic way to determine whether or not an entity has been saved in the database. For store-generated keys obviously we can just look in the clr entity to see if the key is set. For client-side keys, if the user is managing the keys then I don’t think there’s much EF can do. But there is a programming philosophy (which I happen to subscribe to) that says users should work with navigation properties only and let EF and the database manage the keys.

If EF is managing the client-side keys, I think it would be fairly easy to do it in a way that supports a determination of whether or not the entity has been stored in the database. One could maintain the invariant that (a) the CLR key is set iff the entity is either Tracked or has been saved to the database and (b) for entities that are tracked, EF stores a flag saying whether or not it has been saved to the db (this could be accomplished by a shadow property). An implication would be that when entities stop being tracked, the key would be unset again if the entity was not saved. This would also solve # 13575.

In this approach, EF would offer a deal to users: if you let EF manage the keys then, regardless of the key type: (1) EF will correctly decide between Add and Unchanged at the time of tracking, and (2) DbContext will tell you whether or not a given tracked entity has been stored in the database, and (3) for an untracked entity the key will be set if and only if the entity has been stored to the database.

roji commented 1 year ago

I understand the above, and I think there's merit there. At the same time, it could be argued that it isn't EF's job to tell you whether an entity being tracked - after all you can easily do this yourself, e.g. by checking the Guid before calling Attach or via your custom application logic. This maybe does require a bit of adaptation from you (since you have to check before Attach rather than after), but it seems reasonable enough that I'm not sure a big effort is warranted on the EF side. I suspect this is also why not many other users are asking for this.

What do you think @ajcvickers?

sjb-sjb commented 1 year ago

@ajcvickers to answer your question, the reason for using sqlite is it produces a local file and requires no separate setup/configuration. As far as I know it is the only choice for this kind of desktop use, so, I just have to live with the lumps. I’m using the db as a local file produced by the app. I do want to keep an eye in the possibility of using a central db server such as SQL server either for this app in the future or for a different app. From that point of view I’m trying to make my code base as db-agnostic as possible.

sjb-sjb commented 1 year ago

@roji I think you meant, whether or not an entity has been saved to the database — not whether or not it is being tracked.

Sure the user can do this themselves, as they could anything. Given that it simplifies the EF user’s life and solves related problems such as 13575, I would say it can reasonably be considered to be in EF scope. Whether to actually do it or not I suppose is a question of priority relative to other issues.

roji commented 1 year ago

whether or not an entity has been saved to the database — not whether or not it is being tracked.

Yeah, sorry.

ajcvickers commented 1 year ago

@sjb-sjb If you don't want keys to show up in the actual instances until SaveChanges, then use a temporary generator, but mark the values as not-temporary just before saving. For example:

using (var context = new SomeDbContext())
{
    await context.Database.EnsureDeletedAsync();
    await context.Database.EnsureCreatedAsync();

    context.AddRange(new Blog { Posts = { new(), new(), new() }},new Blog { Posts = { new(), new(), new() }});

    await context.SaveChangesAsync();
}

using (var context = new SomeDbContext())
{
    var blogs = context.Blogs.Include(e => e.Posts).ToList();
    Console.WriteLine(context.ChangeTracker.DebugView.LongView);
}

public class SomeDbContext : DbContext
{
    public SomeDbContext()
    {
        SavingChanges += (c, _) =>
        {
            foreach (var entry in ((DbContext)c!).ChangeTracker.Entries())
            {
                entry.Property("Id").IsTemporary = false;
            }
        };
    }

    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder
            .UseSqlite("Data Source=test.db")
            .LogTo(Console.WriteLine, LogLevel.Information)
            .EnableSensitiveDataLogging();

    public DbSet<Blog> Blogs => Set<Blog>();

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Blog>().Property(e => e.Id).HasValueGenerator<TemporaryGuidValueGenerator>();
        modelBuilder.Entity<Post>().Property(e => e.Id).HasValueGenerator<TemporaryGuidValueGenerator>();
    }
}

public class Blog
{
    public Guid Id { get; set; }
    public string? Name { get; set; }
    public List<Post> Posts { get; } = new();
}

public class Post
{
    public Guid Id { get; set; }
    public string? Title { get; set; }
    public Guid? BlogId { get; set; }
    public Blog Blog { get; set; } = null!;
}
sjb-sjb commented 1 year ago

@acjvickers that sounds like a really cool idea. You haven’t said what TemporaryGuidValueGenerator looks like but I’m guessing it somehow sets the guid into the ChangeTracker’s copy of the key, but leaves the entity’s copy unset (default value) and sets IsTemporary=true ?? Would be interested to see that.

I am not clear on why you would have to set IsTemporary back to false explicitly before SaveChanges. Won’t EF see that the value in CurrentValue is Modified relative to OriginalValue, send it to the db and then set IsTemporary to false after the SaveChanges? Or is that logic somehow defeated by IsTemporary?

ajcvickers commented 1 year ago

You haven’t said what TemporaryGuidValueGenerator looks like

You could just look at the code...

Won’t EF see that the value in CurrentValue is Modified relative to OriginalValue, send it to the db and then set IsTemporary to false after the SaveChanges?

Nope. Have a read up on how temporary value work sometime.

sjb-sjb commented 1 year ago

@ajcvickers thanks. I did read that page and the linked page on temporary values and still had the question. And I guess the reason I missed the source code was that I looked for it using Google instead of GitHub. So… all in good faith.

ajcvickers commented 1 year ago

We discussed this in triage and concluded that we are not going to implement features in EF Core to make up for limitations in the SQLite database. TPC with non-GUID generated keys is not something that SQLite is capable of out-of-the-box.