SQL Server deadlocks during recovery/restart

akka / akka-persistence-jdbc

Asynchronously writes journal and snapshot entries to configured JDBC databases so that Akka Actors can recover state

Other

308 stars 142 forks source link

Versions used

Akka version: 2.6.13 Akka Persistence JDBC: 5.0.4 mssql-jdbc: 9.5.0.jre8-preview

Remember Entities is also being used

Expected Behavior

The application should recover and restart any entities without exceptions

Actual Behavior

Upon restarting (normally or through recovery) and an old entity is restarted then a deadlock is encountered.

Relevant logs

RecoveryFailed(com.microsoft.sqlserver.jdbc.SQLServerException: Transaction (Process ID 538) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.)

[ERROR] [...Entity] [] [...actor.default-dispatcher-20] - Supervisor RestartSupervisor saw failure: Exception during recovery. Last known sequence number [0]. PersistenceId [...], due to: Exception during recovery. Last known sequence number [0]. PersistenceId [...], due to: Transaction (Process ID 209) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. akka.persistence.typed.internal.JournalFailureException: Exception during recovery. Last known sequence number [0]. PersistenceId [...], due to: Exception during recovery. Last known sequence number [0]. PersistenceId [...], due to: Transaction (Process ID 209) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

An example deadlocked query set:

(@P0 nvarchar(4000))select max(x2.x3) from (select top (1) "persistence_id" as x4, "sequence_number" as x3 from "event_journal" where "persistence_id" = @P0 order by "sequence_number" desc) x2

AND

@P0 bit,@P1 nvarchar(4000),@P2 bigint,@P3 nvarchar(4000),@P4 bigint,@P5 nvarchar(4000),@P6 varbinary(max),@P7 int,@P8 nvarchar(4000),@P9 varbinary(max),@P10 int,@P11 nvarchar(4000))insert into "event_journal" ("deleted","persistence_id","sequence_number","writer","write_timestamp","adapter_manifest","event_payload","event_ser_id","event_ser_manifest","meta_payload","meta_ser_id","meta_ser_manifest") values (@P0,@P1,@P2,@P3,@P4,@P5,@P6,@P7,@P8,@P9,@P10,@P11) select SCOPE_IDENTITY() AS GENERATED_KEYS

These deadlocks appear to be the result of the schema having VARCHAR columns (8-bit characters) while the JDBC driver by default (because Java strings are Unicode) transmits strings as NVARCHAR values (Unicode). This results in SQL Server having to do a data conversion which requires bypassing the indexing and doing a full-table scan (see https://www.sqlshack.com/query-performance-issues-on-varchar-data-type-using-an-n-prefix/).

In the process of doing that scan, row locks seem to build up (pure speculation on my part: locks are acquired synchronously and released asynchronously, so a fast scan acquires locks much faster than it releases them?) which eventually puts SQL Server into a mode where row locks escalate into page locks and then table locks (see https://docs.microsoft.com/en-us/troubleshoot/sql/performance/resolve-blocking-problems-caused-lock-escalation).

The JDBC driver can be configured to send strings as VARCHAR values; the easiest way in the context of akka-persistence-jdbc is to append ;sendStringParametersAsUnicode=false to the configured JDBC connection string. This will allow the index to be used and avoid a table scan and deadlocks.

The .Net port uses NVARCHAR exclusively in their schema: I think there's a strong case for updating our schema along similar lines to match the default JDBC semantics and prevent the data conversion.
Migration of existing tables from VARCHAR to NVARCHAR is unclear. Some SO answers seem to indicate it would be a manual process which would involve, in some cases, dropping and recreating primary keys. I'd expect most SQL Server users to have a DBA they can reach out to for expertise in doing that migration, and in the meantime the ;sendStringParametersAsUnicode=false mitigation will work as long they don't need unicode entity IDs, tags, etc. This is something that can be called out in release notes and other documentation.

akka / akka-persistence-jdbc