dotnet / efcore

EF Core is a modern object-database mapper for .NET. It supports LINQ queries, change tracking, updates, and schema migrations.
https://docs.microsoft.com/ef/
MIT License
13.6k stars 3.14k forks source link

Throw a better exception message when concurrency token is an array #18505

Open MihaMarkic opened 4 years ago

MihaMarkic commented 4 years ago

Concurrency check doesn't work when using RowVersion (SQL Server's timestamp) and one creates a new instance of byte array, instead of updating existing. Happens when using SQL Server store, but not with InMemory. See code below.

Steps to reproduce

// entity
public class Party
    {
        public int Id { get; set; }
        public string FirstName { get; set; }
        public byte[] RowVersion { get; set; }
    }

// mapping excerpt
modelBuilder.Entity<Party>(entity =>
{
    entity.ToTable("party");
    entity.Property(e => e.Id).HasColumnName("id");
    entity.Property(e => e.FirstName)
    .HasColumnName("first_name")
    .HasMaxLength(255);
    entity.Property(e => e.RowVersion)   // maps to timestamp SQL server type
    .IsRequired()
    .HasColumnName("row_version")
    .IsRowVersion();
}

// this code doesn't raise an exception at all
var party = db.Party.First();
party.RowVersion = new byte[] { 0, 1, 2, 3, 4, 5, 6, 7 };
db.Update(party);
db.SaveChanges();  // <- error, no exception is thrown

// this code correctly raises a concurrency exception
var party = db.Party.First();
party.RowVersion[0] = 99;
db.Update(party);
db.SaveChanges();  // throws dbconcurrencyexception

Further technical details

EF Core version: 3.0.0 Database provider: Microsoft.EntityFrameworkCore.SqlServer (works correctly with InMemory) Target framework: .NET Core 3.0 Operating system: Windows 10

ajcvickers commented 4 years ago

Notes for triage:

So, the question for triage is, beyond fixing the bug, what guidance should we have for:

/cc @AndriySvyryd

ajcvickers commented 4 years ago

Notes from triage:

ThomasBergholdWieser commented 4 years ago

Can we have a documentation for this in either the RowVersion or IsConcurrencyToken documentation?

DerAlbertCom commented 4 years ago

Problem with the current behaviour that it does not work with detached or mapped Entities over an API.

extremly simplified

put(ClientModel model) {
  var entity = getFromEntityDbContext();
  map(model, entity);
  SaveChanges()
}

ClientModel and Entity has a [TimeStamp]. But if we update the TimeStampe (RowVersion) of Entity with the value from the client it is ignored. So the concurrency does not work.

So we are forced to do a manual concurrency check on the RowVersion, instead of letting EF Core to do the Work. This is not the behaviour we expected.

MihaMarkic commented 4 years ago

Yeah, dunno why was type-bug label removed.

DerAlbertCom commented 4 years ago

With @MihaMarkic Infromation RowVersion[0]=1 a workaround can be created.

Something like this can be used in AutoMapper.

            CreateMap<UpdateFieldSpecificationModel, FieldSpecification>()
                .ForMember(d => d.RowVersion, c => c.Ignore())
                .AfterMap((source, destination) =>
                {
                    for (int i = 0; i < destination.RowVersion.Length; i++)
                    {
                        destination.RowVersion[i] = source.RowVersion[i];
                    }
                });

This works.

MihaMarkic commented 4 years ago

@DerAlbertCom Yep, I ended up with doing something like that as well. You could use

Buffer.BlockCopy(source.RowVersion, 0, 
    destination.RowVersion, 0, source.RowVersion.Length);

instead of for loop to make it a bit faster.

MihaMarkic commented 4 years ago

I also added few more checks, here is more complete code of mine

public static IMappingExpression<TDestination, TSource> WithRowVersioning<TDestination, TSource>(this IMappingExpression<TDestination, TSource> expression)
    where TDestination : EditModel
    where TSource : IRowVersioning
{
    return expression
        .ForMember(destEntity => destEntity.RowVersion, opt => opt.Ignore())
        .AfterMap((source, target) =>
        {
            if (source.RowVersion?.Length > 0)
            {
                if ((target.RowVersion?.Length ?? 0) == 0)
                {
                    throw new Exception($"Mapping existing model of {typeof(TDestination).Name} on new entity of {typeof(TSource).Name}. Entity should be an existing one.");
                }
                Buffer.BlockCopy(source.RowVersion, 0, target.RowVersion, 0, source.RowVersion.Length);
            }
            else
            {
                if ((target.RowVersion?.Length ?? 0) != 0)
                {
                    throw new Exception($"Mapping new model of {typeof(TDestination).Name} on existing entity of {typeof(TSource).Name}. Entity should be a new one.");
                }
            }
        });
}
ajcvickers commented 4 years ago

@DerAlbertCom @MihaMarkic Would it be possible for one of you to put together a small, runnable project or complete code listing that shows why these workarounds are needed. I've read through the posts above several times and I'm not understanding what is going wrong here that would require mutating the array like this.

ThomasBergholdWieser commented 4 years ago

I can only speak for myself, and i am using it in Unit Tests. I am not testing database specifics, but how the rest of the infrastructure reacts on the occasion of such an exception. There are workarounds for this, or other ways to test, so its not an issue for us. That being said, I would also be interested to see in what circumstances, outside of testing purposes, such manipulations would be useful.

DerAlbertCom commented 4 years ago

I will create a small sample in a unit tests.

But, this is usefull in Application where it is possible that several People are working on the same Database Record. And in a szenario where the Data is serialized. Like an ASP.NET Core Application. Serialized in the

or over Json in an API. So also the [TimeStamp] RowVersion MUST be serialized, because this is the way how EF Core together with SQL Server supports a simple optimistic Concurrency Check.

And if i have to serialize the RowVersion of the given DataRecord, i must be able to set the RowVersion on the entity. So that EF Core can fill the right parameter with the Value of the RowVersion at the for the SQL Update Query.

User A:
var version1 = getRecord('key');
write version in <form/>  (with RowVersion), user A sees the data
----
User B:
var version1 = getRecord('key');
write version in <form/> (with RowVersion) to user B sees the data
user B changes the data
user B creates save, makes an HTTP post to the server
Server gets current version of the Entity from the db
Server fill the data from the post in the in the entity (including RowVersion)
Server saves the data (Version 2 is stored)
----
user A changes the data
user A creates save, makes an HTTP post to the server
Server gets current version (now Version2) of the Entity from the db
Server fill the data from the post in the in the entity (including RowVersion)
EF Core should throw in Concurrency Exception, but does not. 
And this is the Problem. It simply overwrite the changes of User A.

Only with Updating the RowVersion byte by byte. EF Core works as expected. Or if we manually check the RowVersion.

Current EF Core only work in Szenarios where the Entity are kept long, like in WinForms or WPF Applications.

Any Open Questions?

ThomasBergholdWieser commented 4 years ago

Just for pointing out a possible workaround for everyone else wondering what to do: you could SELECT the entity using a WHERE clause and the clients timestamp, modify the retrieved entity and perform an UPDATE. That way you can be sure that your scenario will still be detected (no entity will be SELECTed if timestamp has changed BEFORE, and UPDATE will throw DbConcurrencyException if Timestamp changed between SELECT and UPDATE).

Hope this makes sense.

DerAlbertCom commented 4 years ago

Sorry, I'm lost. Why are we talking about possible workarounds? They are plenty of Workarounds.

With your Workaround we have to select the entity twice if there was a change before the first select.

Why should we Workaround?

ThomasBergholdWieser commented 4 years ago

You wrote, that Entity Framework is not working as expected, and i'm not sure about that. As i see it, you are not expected to modify RowVersion by hand, and that needs to be documented better.

I don't understand what you mean with "you have to select the entity twice", the amounts of SELECTs based on your scenario stays exactly the same. I will use your scenario to demonstrate what i mean.

Everytime you do Server gets current version of the Entity from the db in your scenario add WHERE Timestamp == TimestampFromUser to that query. That way you only get the entity if it hasn't been changed yet. If you get an entity, you now no longer need to set the Timestamp at is now exactly the same (or your query wouldn't return anything).

Next step Server fill the data from the post in the in the entity but you dont need to include the RowVersion. We assume it is the same. Now when you save it, and the Timestamp has changed in the Db, you will get a DbUpdateException.

MihaMarkic commented 4 years ago

@ThomasBergholdWieser This is a bug IMO. What is the difference or replacing (timestamp) array values with replacing array from developer perspective? It should be handled the same by EF engine. IMO - it has an array and it should compare it to original value regardless how it was set. If it doesn't work like that, I guarantee you millions of devs bumping into the same pitfall.

DerAlbertCom commented 4 years ago

@ThomasBergholdWieser If the entity was changed and i select with ID and the Old TimeStamp then NO entity is return. In that case i know the entity is deleted or changed. But i don't have an entity which i can update. So i have to select the Entity again by Id only. If a entity is returned then it was not deleted and i can update the entity. This is the reason I have to selected twice.

DerAlbertCom commented 4 years ago

@MihaMarkic yes, most developers will fall in the pitfall. they write even blogpost https://dejanstojanovic.net/aspnet/2018/november/handling-data-concurrency-in-ef-core-and-aspnet-core-webapi this is clearly not tested.

ThomasBergholdWieser commented 4 years ago

@MihaMarkic I disagree. RowVersion (under different names) has been around for a few years, dating back to EF Classic, i don't see millions of users. But to the point, RowVersion is not a "regular" property, as its value is supposed to be generated by the database and not be modified by the User.

@DerAlbertCom I don't get your problem. If you need the entity, then dont include the Timestamp and compare in code for the same effect. Problem solved with no increase in SELECTs.

MihaMarkic commented 4 years ago

@ThomasBergholdWieser Of course it has to be modified by the user. How else would you deal with models otherwise, where you don't keep entities around, just models? I have no idea how it worked before.

ThomasBergholdWieser commented 4 years ago

@MihaMarkic I mentioned a way in my reply to the mentioned scenario, please have a look.

MihaMarkic commented 4 years ago

@ThomasBergholdWieser You are right, that scenario works and is a valid one. It would fail if one caches such entities though. However, I still think that the way EF behaves is not a correct one, or at least highly counter intuitive.

DerAlbertCom commented 4 years ago

@ThomasBergholdWieser i know how to work around, the manual TimeStamp validation is that what we do. You suggested to select the entity including the Timestamp. And this is not a good workaround because of the case I described here https://github.com/aspnet/EntityFrameworkCore/issues/18505#issuecomment-561624397.

MihaMarkic commented 4 years ago

@DerAlbertCom I thought of that, too. Perhaps you would just load entity by id and then manually compare timestamp array to the model's one. That way you have both info with a single fetch.

mjamro commented 4 years ago

@MihaMarkic I disagree. RowVersion (under different names) has been around for a few years, dating back to EF Classic, i don't see millions of users. But to the point, RowVersion is not a "regular" property, as its value is supposed to be generated by the database and not be modified by the User.

Precisely! You never need to update RowVersion so it's a perfect fit for concurrency check - the modified value of RowVersion property should be used only for concurrency check and not update the column in the database.

Pseudocode:

foo.Value = "bar";
foo.RowVersion = newValue;
db.SaveChanges(); // -> UPDATE foo SET Value = 'bar' WHERE Id = <id> AND RowVersion = <newValue>
DerAlbertCom commented 4 years ago

@mariuszjamro exactly, MUST not be updated. But used for the check. I never expected a forced updated of a server generated timestamp.

poke commented 4 years ago

I think the arguments why this is supposed to be a bug ignore the use case when the concurrency token is not a row version. Let’s take ASP.NET Core Identity’s UserStore as an example: That one use a string-based concurrency token that’s just a Guid that gets updated on every write. So it basically looks like this (simplified):

user.Property = newValue;
user.ConcurrencyStamp = Guid.NewGuid().ToString();
await db.SaveChangesAsync();

Here, the generated query should be:

UPDATE user
SET Property = '<newValue>',
    ConcurrencyStamp = '<newGuid>'
WHERE Id = <userId>
  AND ConcurrencyStamp = '<oldGuid>'

So obviously, this mechanism needs a way to set the concurency token while also comparing against the old one.

That’s why examples like the following that was mentioned simply don’t work:

put(ClientModel model) {
    var entity = GetFromEntityDbContext();
    Map(model, entity);
    SaveChanges();
}

By mapping the client model on top of the current state of the database, you cannot compare against the memorized value of the concurrency token if you also want a mechanism to update the token.

So the solution has to be something like this:

put(ClientModel model) {
    var entity = GetFromEntityDbContext();

    // restore memorized concurrency token
    db.Entry(entity).Property(nameof(Entity.ConcurrencyToken)).OriginalValue = entity.ConcurrencyToken;

    // map the updatable properties, and reset the concurrency token
    MapEverythingExceptConcurrencyToken(model, entity);
    entity.ConcurrencyToken = GenerateNewToken();

    SaveChanges();
}

The bug mentioned in OP’s code is that party.RowVersion references the same array object as the original value. So mutating that array also mutates the original value of the concurrency token which means that a different value is being used in the WHERE to detect the conflict.

But the underlying issue is that you simply cannot utilize the concurrency check if you work with a fresh entity without manually resetting its original concurrency token value back to what you want to validate against.

DerAlbertCom commented 4 years ago

When the Use Case of the [Timestamp] is not a Concurrency Token then it should be documented otherwise, and EF Core should not throw an DbUpdateConcurrencyException if is changed. Yes, Workarounds are possible. Why is the RowVersion generated on the Client side?

jannikbuschke commented 4 years ago

Following the discussion I am not 100% what the best solution to this is. However what I think this is an undocumented breaking behavioral change in EF Core 3.0. Previously it was not possible to modify a RowVersion clientside. This would have thrown an DbConcurrentUpdateException. Now no exception is thrown and instead a given clientside rowversion is accepted as a new value (mapped from a disconnected entity).

So implementations that rely on the old behavior are now broken (implementations like this might not be optimal regarding above thread but I am not sure about this).

To me this seems as a big issue (despite the relative low anticipation this thread has), as it is relatively hard to detect. If there are no tests that explicitly check for concurrent updates, developers will assume their original code still work. Concurrent scenarios are especially difficult to test for single developers. So this behavioral change has the potential to stay undetected for users and developers alike. This also does not crash the app, instead there is "silent data loss", as concurrent updates just silently override each other.

rcollette commented 3 years ago

Given a postgresql database and using xid as the RowVersion property

        [Timestamp]
        [DatabaseGenerated(DatabaseGeneratedOption.Computed)]
        // ReSharper disable once StringLiteralTypo
        [Column("xmin", TypeName = "xid")]
        public uint RowVersion { get; set; }

If the entity is fetched as tracked, we Automap a DTO/model whose RowVersion is outdated to the entity, and then save changes. Concurrency detection does not work.

Same scenario but fetching the entity with AsNoTracking(), concurrency detection does work.

I would expect it to work in both scenarios. In the tracked mode, Automapper does update the RowVersion property, but the update statement that gets executed has an xid predicate whose value is the one that was fetched from the database, not what was mapped.

I don't see why this behavior would not be the same in both scenarios. I get that a database generated field wouldn't expect to be changed, and therefore the tracked mode probably ignores that field when generating the SQL, but then the behavior should be consistent for both tracked and untracked entities. I think the tracked mode is broken.

Meigyoku-Thmn commented 3 years ago

Ok, today I bumped into this exact problem. I will summarise this: If you set the RowVersion property in your Entity class as a concurrency token. Then the following code will not work:

party.RowVersion = form.RowVersion;
party.... = form...;
db.SaveChanges();

Many of us nowadays use form, then map from form to entity, or to entities, with complex mapping rules and validations. But the way EF and EF Core handles concurrency conflict is, if I guess right: it assumes that the form and the entity have to be one, that is, you are supposed to use the same class for form and entity:

var party = form;
db.Update(party);
db.SaveChanges();

Or, you are supposed to create your own entity object with primary key, then pass it to db.Update, without any additional queries:

var party = MakeEntityFromForm(form);
db.Update(party);
db.SaveChanges();

But nowadays the way we do is like this (I will not judge that if this is right or wrong):

var party = db.Party.Find(id);
ValidateAndMap(form, party);
db.SaveChanges();

EF Core will use the RowNumber of the first data state that it tracks. You cannot normally change it.

Solution: this is the shortest way to force EF to check for concurrency conflict using the RowNumber from the form of the user:

var party = db.Party.Find(id);
ValidateAndMap(form, party);
db.Entry(party).OriginalValues[nameof(Party.RowVersion)] = form.RowVersion;
db.SaveChanges();

I wonder why EF Core's APIs are so old school with unexpected pitfal like this?

UncleFirefox commented 3 years ago

I stumbled on the same error, thanks @Meigyoku-Thmn! Does anybody know if this is still an issue? Or is it fixed in future versions of EF?

Meigyoku-Thmn commented 3 years ago

@UncleFirefox This is not a bug, it's a feature, an API design problem. I doubt that MS will change the API anytime soon. Maybe in EF Core 6? Let's see. There should be a dedicated API for concurrency conflict checking alone.

kjkrum commented 2 years ago

This violates user expectations in a way that is very likely to lead to silent data loss. IMO, this is a lock the doors, order pizza, nobody goes home until it's fixed level issue.

@Meigyoku-Thmn 's workaround produces the expected behavior:

var party = db.Party.Find(id);
ValidateAndMap(form, party);
db.Entry(party).OriginalValues[nameof(Party.RowVersion)] = form.RowVersion; // the workaround
db.SaveChanges();

But this is a maintenance nightmare because the workaround needs to appear in every update method of every service that operates on row versioned entities. You can't easily search for the absence of a line of code.

This workaround may be more maintainable. EF entities with a row version extend a base class:

public abstract class RowVersionedEntity
{
    [Timestamp]
    public byte[] RowVersion { get; set; }
}

You can easily audit that no entity directly contains a [Timestamp] property. Then in your DbContext, override whichever SaveChanges methods you're using:

public override Task<int> SaveChangesAsync(CancellationToken cancellationToken = default)
{
    foreach (var entry in ChangeTracker.Entries<RowVersionedEntity>())
    {
        var prop = entry.Property(nameof(RowVersionedEntity.RowVersion));
        prop.OriginalValue = prop.CurrentValue;
    }
    return base.SaveChangesAsync(cancellationToken);
}
kjkrum commented 6 months ago

but the update statement that gets executed has an xid predicate whose value is the one that was fetched from the database, not what was mapped

Exactly. That is the bug.