OrchardCMS / OrchardCore

Orchard Core is an open-source modular and multi-tenant application framework built with ASP.NET Core, and a content management system (CMS) built on top of that framework.
https://orchardcore.net
BSD 3-Clause "New" or "Revised" License
7.23k stars 2.34k forks source link

Unexpected error changing roles to users in a concurrent scenario #9639

Open jersiovic opened 3 years ago

jersiovic commented 3 years ago

Describe the bug

When two concurrent requests arrive to orchard server changing each request the role of a different user, sometimes it fails with this exception: Unexpected error while creating a new user. Microsoft.Data.SqlClient.SqlException (0x80131904): The INSERT statement conflicted with the FOREIGN KEY constraint 'FK_UserByRoleNameIndex_Document_Id'. The conflict occurred in database 'ktptests', table 'dbo.UserByRoleNameIndex', column 'Id'. The reason seems to be a problem with YesSql Reduce indexes. For this scenario for which I'm reporting the problem it happens with UserByRoleNameIndex reduce index. But I think it could happen whith other reduce indexes like those used at openid module.

UserByRoleNameIndex table stores an index id, the role name and the count of users with that role UserByRoleNameIndex_Document has two foreing keys one pointing to the index id at UserByRoleNameIndex and the other to the related User Document to that role. So, it has as many records at UserByRoleNameIndex_Document as users contain the role. And their foreign key pointing to UserByRoleNameIndex index.

When there is not any user that contains a role, the record at UserByRoleNameIndex is removed and UserByRoleNameIndex_Document records pointing to that index record also are removed or updated to point to the new index.

If you have a role A with only one user with that role. Problem comes when you have two requests, the first one removing the role A from the user or changing it to another role B. And the second concurrent request adding the role A to another user.

Problem happens cause first one removes the UserByRoleNameIndex record for the role A, however second request happened at the same time and it readed UserByRoleNameIndex record for the role A before first request commited its remove. This causes second request at some point inserts or update a record at UserByRoleNameIndex_Document with a foreign key pointing to an UserByRoleNameIndex that has just been removed. and that causes de error.

To Reproduce

Steps to reproduce the behavior:

  1. Assing to a user A a role A that no other user has.
  2. Execute this two requests at the same time:
    • Remove role A from the user A
    • Add a role A to another user B
  3. Repeat till it fails (Easier programing a test that repeats it till fail. I don't provide it cause I experience it in a more complex and not shareable test set)

Expected behavior

Not sure. One posible solution is that yessql for ay request that captures the exception because of a FOREIGN KEY constraint violation for an XXXXX_Document table retry all the reduce operation taking into account the UserByRoleNameIndex. In this way it will take into account the record that was removed, and we avoid lock. The other solution I see is to add locks for reading reduce indexes within the yessql recude method. The point is we would need to read and update index tables always in the same order to avoid interlock problems.

Another more quick solution that I will apply to our Orchard fork while a definitive fix is provided for yessql is to not to use reduce indexes at orchard. I've seen the only reduce indexes used is this one and a buch of them at openid module. If you want I can provide a PR when I finish it if you are interested on fixing it in that way while it is fixed at YesSql

Screenshots

Just in case it helps, this is the place where we should fix sth, because it is there where it adds a foreign key to a record that doesn't exists https://github.com/sebastienros/yessql/blob/main/src/YesSql.Core/Session.cs#L1189

I provide a log showing the sqls related with the problem that were executed just before the exception is received

These are the users that appear in this log and the proccessId that executes the operations related to each user

[ProcessId][UserId][NormalizedUserName]
[18] 4frytmr97we1v7qqjtwadsca51 CONSUMER.FIRSTCONSUMERCANSEARCHANOTHERCONSUMER@PCCOM.ES
[20] 47jd0ppdm6nzkw8w2kg2b16xwb CONSUMER.SECONDCONSUMERCANTSEARCHANOTHERPROVIDER@PCCOM.ES

This is the log with the sqls and the value of its params within the query. Take into account I've removed sqls not related to this problem to make it easier to see the problem:

I 2021-06-08 00:32:36.0356|[51]|Default|00-73db2b9ee3712649b660f6b6d680a108-820e7883b746214f-00||KeTePongoServer.Extensions.DbAccess.DapperConnectionWithSQLLoggerFactory|INFO|Dapper query:
"select * from [UserByRoleNameIndex] where [RoleName] = 'PENDING CONSUMER USER'"

2021-06-08 00:32:36.0773|[18]|Default|00-22e4690f2bad0248a094c04f807682ce-7334ec6da2c79a42-00||KeTePongoServer.Extensions.DbAccess.DapperConnectionWithSQLLoggerFactory|INFO|Dapper query:
                                                                                "delete from [UserIndex] where [DocumentId] = @Id_1;delete from [UserByLoginInfoIndex] where [DocumentId] = @Id_2;delete from [UserByClaimIndex] where [DocumentId] = @Id_3;delete from [KeTePongoUserIndex] where [DocumentId] = @Id_4;delete from [KeTePongoUserWithPhoneIndex] where [DocumentId] = @Id_5;
delete from [UserByRoleNameIndex_Document] where [UserByRoleNameIndexId] = 14;
delete from [UserByRoleNameIndex] where [Id] = 14;                              insert into [UserIndex] ([UserId], [NormalizedUserName], [NormalizedEmail], [IsEnabled], [DocumentId]) values (@UserId7, @NormalizedUserName7, @NormalizedEmail7, @IsEnabled7, @DocumentId7) ; select SCOPE_IDENTITY() [Id];insert into [KeTePongoUserIndex] ([ConsumerOID], [PhoneNumber], [ProviderOID], [DocumentId]) values (@ConsumerOID8, @PhoneNumber8, @ProviderOID8, @DocumentId8) ; select SCOPE_IDENTITY() [Id];update [Document] set [Content] = @Content_9, [Version] = @Version_9 where [Id] = @Id_9;
update [UserByRoleNameIndex] set [RoleName] = 'NO CONSUMER USER', [Count] = 1 where [Id] = 15;
delete from [UserByRoleNameIndex_Document] where [DocumentId] = 281 and [UserByRoleNameIndexId] = 15;
update [UserByRoleNameIndex] set [RoleName] = 'CONSUMER ADMIN USER', [Count] = 11 where [Id] = 9;
insert into [UserByRoleNameIndex_Document] ([UserByRoleNameIndexId], [DocumentId]) values (9, 281);"
Parameters: "[Id_1, 281], [Id_2, 281], [Id_3, 281], [Id_4, 281], [Id_5, 281], [Id_6, 14], [DocumentId7, 281], [IsEnabled7, True], [NormalizedEmail7, CONSUMER.FIRSTCONSUMERCANSEARCHANOTHERCONSUMER@PCCOM.ES], [NormalizedUserName7, CONSUMER.FIRSTCONSUMERCANSEARCHANOTHERCONSUMER], [UserId7, 4frytmr97we1v7qqjtwadsca51], [DocumentId8, 281], [ProviderOID8, 0], [PhoneNumber8, <null>], [ConsumerOID8, 201], [Version_9, 1], [Content_9, {"Id":281,"UserId":"4frytmr97we1v7qqjtwadsca51","UserName":"consumer.firstConsumerCanSearchAnotherConsumer","NormalizedUserName":"CONSUMER.FIRSTCONSUMERCANSEARCHANOTHERCONSUMER","Email":"consumer.firstConsumerCanSearchAnotherConsumer@pccom.es","NormalizedEmail":"CONSUMER.FIRSTCONSUMERCANSEARCHANOTHERCONSUMER@PCCOM.ES","PasswordHash":"AQAAAAEAACcQAAAAEPdp52TfhcdzOn9oZhGqL9B529uigpVqXna5I25Gz81x3Ett5KpaNqqhTsMLpPgPlw==","SecurityStamp":"KLPYFO62MHNMCU7VFHO4ZETXJ2RKA7I3","EmailConfirmed":true,"IsEnabled":true,"RoleNames":["CONSUMER ADMIN USER"],"UserClaims":[],"LoginInfos":[],"UserTokens":[],"Properties":{"UserProfile":{"UserTypes":[0],"Name":"consumer.firstConsumerCanSearchAnotherConsumer","NewEmailRequested":null,"IsPhoneConfirmed":false,"PhoneNumber":null,"NewPhoneNumberRequested":null,"ConsumerOID":201,"ProviderOID":0}}}], [Id_9, 281], [RemovedId_10_0, 281], [Id_10, 15], [Id10, 15], [Count10, 1], [RoleName10, NO CONSUMER USER], [AddedId_11_0, 281], [Id_11, 9], [Id11, 9], [Count11, 11], [RoleName11, CONSUMER ADMIN USER]", elapsed: 0 ms, context: "Empty", connection: "Empty" 

2021-06-08 00:32:36.0931|[20]|Default|00-73db2b9ee3712649b660f6b6d680a108-820e7883b746214f-00||KeTePongoServer.Extensions.DbAccess.DapperConnectionWithSQLLoggerFactory|INFO|Dapper query:
                                                                                "insert into [Document] ([Id], [Type], [Content], [Version]) values (270, OrchardCore.Users.Models.User, @Content_1, 1);insert into [UserIndex] ([UserId], [NormalizedUserName], [NormalizedEmail], [IsEnabled], [DocumentId]) values (@UserId2, @NormalizedUserName2, @NormalizedEmail2, @IsEnabled2, @DocumentId2) ; select SCOPE_IDENTITY() [Id];insert into [KeTePongoUserIndex] ([ConsumerOID], [PhoneNumber], [ProviderOID], [DocumentId]) values (@ConsumerOID3, @PhoneNumber3, @ProviderOID3, @DocumentId3) ; select SCOPE_IDENTITY() [Id];
update [UserByRoleNameIndex] set [RoleName] = 'PENDING CONSUMER USER', [Count] = 2 where [Id] = 14;
insert into [UserByRoleNameIndex_Document] ([UserByRoleNameIndexId], [DocumentId]) values (14, 270);"
Parameters: "[Version_1, 1], [Content_1, {"Id":270,"UserId":"47jd0ppdm6nzkw8w2kg2b16xwb","UserName":"consumer.secondConsumerCantSearchAnotherProvider","NormalizedUserName":"CONSUMER.SECONDCONSUMERCANTSEARCHANOTHERPROVIDER","Email":"consumer.secondConsumerCantSearchAnotherProvider@pccom.es","NormalizedEmail":"CONSUMER.SECONDCONSUMERCANTSEARCHANOTHERPROVIDER@PCCOM.ES","PasswordHash":"AQAAAAEAACcQAAAAEMZli6H3mdOxPfOa2Hy0q3C6ierJSGEqFZFGGJ1IODq9dmeFSevaRSbgB79dqZJyHw==","SecurityStamp":"RTKKE772DEISS6AHUBGDWVOYJEFKFFN5","EmailConfirmed":false,"IsEnabled":true,"RoleNames":{"$type":"System.String[], System.Private.CoreLib","$values":["PENDING CONSUMER USER"]},"UserClaims":[],"LoginInfos":[],"UserTokens":[],"Properties":{}}], [Type_1, OrchardCore.Users.Models.User, OrchardCore.Users.Core], [Id_1, 270], [DocumentId2, 270], [IsEnabled2, True], [NormalizedEmail2, CONSUMER.SECONDCONSUMERCANTSEARCHANOTHERPROVIDER@PCCOM.ES], [NormalizedUserName2, CONSUMER.SECONDCONSUMERCANTSEARCHANOTHERPROVIDER], [UserId2, 47jd0ppdm6nzkw8w2kg2b16xwb], [DocumentId3, 270], [ProviderOID3, 0], [PhoneNumber3, <null>], [ConsumerOID3, 0], [AddedId_4_0, 270], [Id_4, 14], [Id4, 14], [Count4, 2], [RoleName4, PENDING CONSUMER USER]", elapsed: 0 ms, context: "Empty", connection: "Empty" 

2021-06-08 00:32:36.1268|[20]|Default|00-73db2b9ee3712649b660f6b6d680a108-820e7883b746214f-00||OrchardCore.Users.Services.UserStore|ERROR|Unexpected error while creating a new user. Microsoft.Data.SqlClient.SqlException (0x80131904): The INSERT statement conflicted with the FOREIGN KEY constraint 'FK_UserByRoleNameIndex_Document_Id'. The conflict occurred in database 'ktptests', table 'dbo.UserByRoleNameIndex', column 'Id'.
   at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at Microsoft.Data.SqlClient.SqlDataReader.TryHasMoreRows(Boolean& moreRows)
   at Microsoft.Data.SqlClient.SqlDataReader.TryHasMoreResults(Boolean& moreResults)
   at Microsoft.Data.SqlClient.SqlDataReader.TryNextResult(Boolean& more)
   at Microsoft.Data.SqlClient.SqlDataReader.NextResult()
   at YesSql.Commands.CreateIndexCommand.<AddToBatch>b__6_0(DbDataReader dr)
   at YesSql.Commands.BatchCommand.ExecuteAsync(DbConnection connection, DbTransaction transaction, ISqlDialect dialect, ILogger logger)
   at YesSql.Session.FlushAsync()
   at YesSql.Session.FlushAsync()
   at YesSql.Session.SaveChangesAsync()
   at YesSql.Session.SaveChangesAsync()
   at OrchardCore.Users.Services.UserStore.CreateAsync(IUser user, CancellationToken cancellationToken)
ClientConnectionId:37c05065-a8a8-46fc-998f-5c8e6a861f34
Error Number:547,State:0,Class:16    at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at Microsoft.Data.SqlClient.SqlDataReader.TryHasMoreRows(Boolean& moreRows)
   at Microsoft.Data.SqlClient.SqlDataReader.TryHasMoreResults(Boolean& moreResults)
   at Microsoft.Data.SqlClient.SqlDataReader.TryNextResult(Boolean& more)
   at Microsoft.Data.SqlClient.SqlDataReader.NextResult()
   at YesSql.Commands.CreateIndexCommand.<AddToBatch>b__6_0(DbDataReader dr)
   at YesSql.Commands.BatchCommand.ExecuteAsync(DbConnection connection, DbTransaction transaction, ISqlDialect dialect, ILogger logger)
   at YesSql.Session.FlushAsync()
   at YesSql.Session.FlushAsync()
   at YesSql.Session.SaveChangesAsync()
   at YesSql.Session.SaveChangesAsync()
   at OrchardCore.Users.Services.UserStore.CreateAsync(IUser user, CancellationToken cancellationToken)
jersiovic commented 3 years ago

Definitively it seems all the reduce indexes in orchard source can be transformed on map indexes. Reason is all of them are only useful to have a Count property which is not used anywhere, so no problem if that property is removed from those indexes and the index is transformed in a map index.

sebastienros commented 3 years ago

Converting to a map index is not acceptable, it will be a broken change, while the issue should be fixed. A temporary mitigation would be to add a lock in the roles management.

In YesSql as you suggested it should use both a lock and optimistic concurrency (detect when another thread updated the records).

jersiovic commented 3 years ago

Locks at users and opendid module for me it is not enough cause in production it runs on more than one instance, So I would wait for a solution on yessql side based on optimistic concurrency or db locks.

When you say it is a broken change, you mean others on third party maybe are using that Count property? isn't it? It looks unlikely but possible. So, I discard my pr

sebastienros commented 3 years ago

@jersiovic would a custom exception (like the concurrency check one) be sufficient for your needs when a conflict is found, so you can either decide to retry the load/update/save operation or display an error message to the user stating it failed? I am worried to implement some retry logic directly in yessql when there is a conflict, because where should the retry stop if issues keep occurring?

jersiovic commented 3 years ago

@sebastienros problems I see for that solution:

jersiovic commented 3 years ago

@sebastienros thinking it twice I see that an acceptable soluction could be that yessql does a retry mechanism which can be configured at the store configuration how many retries should be done. And after that number of retries a custom exception is thrown. This will save us to code retry mechanism for any entity invovled in a reduce index at orchard code. And in the case the number of retries fail I see it ok to handle it at orchard as an error, reporting to the user that the operation failed cause system is under heavy load or sth like that.