dotnet / SqlClient

Microsoft.Data.SqlClient provides database connectivity to SQL Server for .NET applications.
MIT License
844 stars 280 forks source link

TransactionAbortedException when performing queries in parallel inside a transaction scope #1675

Closed joostmeijles closed 9 months ago

joostmeijles commented 2 years ago

Describe the bug

When performing multiple queries in parallel, and each pair of 2 queries are inside a transaction scope, an unexpected transaction error is thrown (see below).

Exception message: System.Transactions.TransactionAbortedException: The transaction has aborted.
Stack trace:
Unhandled exception. System.Transactions.TransactionAbortedException: The transaction has aborted.
 ---> System.Transactions.TransactionPromotionException: Failure while attempting to promote transaction.
 ---> System.InvalidOperationException: The requested operation cannot be completed because the connection has been broken.
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransaction(TransactionRequest transactionRequest, String name, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
   at Microsoft.Data.SqlClient.SqlDelegatedTransaction.Promote()
   --- End of inner exception stack trace ---
   at Microsoft.Data.SqlClient.SqlDelegatedTransaction.Promote()
   at System.Transactions.TransactionStatePSPEOperation.PSPEPromote(InternalTransaction tx)
   at System.Transactions.TransactionStateDelegatedBase.EnterState(InternalTransaction tx)
   --- End of inner exception stack trace ---
   at System.Transactions.TransactionStateAborted.CheckForFinishedTransaction(InternalTransaction tx)
   at System.Transactions.EnlistableStates.Promote(InternalTransaction tx)
   at System.Transactions.Transaction.Promote()
   at System.Transactions.TransactionInterop.ConvertToDistributedTransaction(Transaction transaction)
   at System.Transactions.TransactionInterop.GetExportCookie(Transaction transaction, Byte[] whereabouts)
   at Microsoft.Data.SqlClient.SqlInternalConnection.GetTransactionCookie(Transaction transaction, Byte[] whereAbouts)
   at Microsoft.Data.SqlClient.SqlInternalConnection.EnlistNonNull(Transaction tx)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.ActivateConnection(Transaction transaction)
   at Microsoft.Data.ProviderBase.DbConnectionPool.PrepareConnection(DbConnection owningObject, DbConnectionInternal obj, Transaction transaction)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInterna
l& connection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry, SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.InternalOpenAsync(CancellationToken cancellationToken)
--- End of stack trace from previous location ---

To reproduce

Below code reproduces the error. It runs 100K times 2 queries in parallel where each 2 queries are inside a single transaction scope. Note that after each query the database connection is disposed (and thus closed). This is important to trigger the error. When we change the code, and use 1 connection for both queries and only open 1 connection the error does not occur (see also code comments below).

using System.Transactions;
using Microsoft.Data.SqlClient;

static async Task PerformTransactionWithQuery(int num)
{
    try
    {
        using (new TransactionScope(TransactionScopeAsyncFlowOption.Enabled))
        {
            // Adding "max pool size=1000;" to the connection string seems to trigger the problem less often
            string connStr = @"Server=.\SQLEXPRESS;Database=master;Trusted_Connection=True;Encrypt=False";

            var query = "SELECT COUNT(*) FROM sys.dm_tran_active_transactions";

            await using (var dbConn = new SqlConnection(connStr))
            {
                await dbConn.OpenAsync();

                await using (var command1 = new SqlCommand(query, dbConn))
                {
                    await command1.ExecuteScalarAsync();
                }
            } // Connection is disposed (and thus closed)

            await using (var dbConn = new SqlConnection(connStr)) 
            {
                // Reopening the connection triggers the following error:
                // System.Transactions.TransactionAbortedException: The transaction has aborted.
                //
                // NB. Using a single connection and opening it once does NOT trigger the error
                await dbConn.OpenAsync();

                await using (var command2 = new SqlCommand(query, dbConn))
                {
                    await command2.ExecuteScalarAsync();
                }
            } // Connection is disposed (and thus closed)

            //Do not complete transaction
        }
    }
    catch (Exception e)
    {
        Console.WriteLine($"Failed {num}");
        throw;
    }
} 

var tasks = Enumerable.Range(0, 100000).ToList().Select(PerformTransactionWithQuery);
await Task.WhenAll(tasks);

Expected behavior

Being able to use multiple connections (with the same connection string) in sequence within the same transaction without errors.

Further technical details

Microsoft.Data.SqlClient version: 4.1.0 .NET target: .NET 6 SQL Server version: SQL Server 2019 Operating system: Windows 11

DavoudEshtehari commented 9 months ago

@sdrapkin thanks for the prompt reply. This issue doesn't apply to S.D.SqlClient because it hasn't received another fix through PR #543. Generally, S.D.SqlClient is currently under support and only addresses significant bugs and security issues.

sdrapkin commented 9 months ago

Generally, S.D.SqlClient is currently under support and only addresses significant bugs and security issues.

IMHO this is a significant bug, and Microsoft should fix it in S.D.SqlClient as well (which a vast number of enterprise software still depends on).

DavoudEshtehari commented 9 months ago

I'll bring up your concern in the next bug triage meeting and will update here if there's a different outcome.

joostmeijles commented 9 months ago

This package from the PR #2301 can be used for verification. Please, give it a try as this issue varies across different machines.

Verified on my machine: .NET target: .NET 6 SQL Server version: SQL Server 2019 Operating system: Windows 11

Works like a charm! Fantastic work @DavoudEshtehari 🥇

razvalex commented 8 months ago

First thank you very much for prioritizing this issue appropriately. I just wanted to share what we've encountered till now in our tests;

Some time ago we migrated from System.Data.SqlClient to Microsoft.Data.SqlClient 3.1.1, which worked as expected for us (we're using transaction scope quite intensively and we're trying to migrate away);

Recently, due to https://msrc.microsoft.com/update-guide/vulnerability/CVE-2024-0056, we tried to upgrade from 3.1.1 to 3.1.5;

We started to encounter two different issues from time to time (~20 times per day out of ~2M requests & tens of thousands background tasks):

  1. System.InvalidOperationException: The operation is not valid for the current state of the enlistment.
    [{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.EnlistmentState.Committed","level":0,"line":39,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/EnlistmentState.cs"},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.SinglePhaseEnlistment.Committed","level":1,"line":55,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/SinglePhaseEnlistment.cs"},{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.SqlClient.SqlDelegatedTransaction.SinglePhaseCommit","level":2,"line":0},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.DurableEnlistmentCommitting.EnterState","level":3,"line":160,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/DurableEnlistmentState.cs"},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.CommittableTransaction.Commit","level":4,"line":82,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/CommittableTransaction.cs"},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.TransactionScope.InternalDispose","level":5,"line":800,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/TransactionScope.cs"},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.TransactionScope.Dispose","level":6,"line":738,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/TransactionScope.cs"}]
  2. System.InvalidOperationException: The requested operation cannot be completed because the connection has been broken.
    [{"severityLevel":"Error","outerId":"0","message":"System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation.","type":"System.Reflection.TargetInvocationException","id":"64648675","parsedStack":[{"assembly":"System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e","method":"System.Reflection.MethodBaseInvoker.InvokeWithNoArgs","level":0,"line":61,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Private.CoreLib/src/System/Reflection/MethodBaseInvoker.cs"},{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.SqlClient.SqlInternalConnection.EnlistNonNull","level":1,"line":0},{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.ProviderBase.DbConnectionPool.PrepareConnection","level":2,"line":0},{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection","level":3,"line":0},{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection","level":4,"line":0},{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal","level":5,"line":0},{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.SqlClient.SqlConnection.InternalOpenAsync","level":6,"line":0},{"assembly":"System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e","method":"System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw","level":7,"line":53,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Private.CoreLib/src/System/Runtime/ExceptionServices/ExceptionDispatchInfo.cs"},{"assembly":"System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e","method":"System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess","level":8,"line":154,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/TaskAwaiter.cs"},{"assembly":"System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e","method":"System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification","level":9,"line":118,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/TaskAwaiter.cs"},...,"method":"System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess","level":12,"line":154,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Private.CoreLib/src/System/Runtime/CompilerServices/TaskAwaiter.cs"},...,{"severityLevel":"Error","outerId":"64648675","message":"The transaction has aborted.","type":"System.Transactions.TransactionAbortedException","id":"53313168","parsedStack":[{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.TransactionStateAborted.CheckForFinishedTransaction","level":0,"line":1493,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/TransactionState.cs"},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.EnlistableStates.PromotedToken","level":1,"line":662,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/TransactionState.cs"},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.Transaction.GetPromotedToken","level":2,"line":462,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/Transaction.cs"},{"assembly":"System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e","method":"InvokeStub_Transaction.GetPromotedToken","level":3,"line":0},{"assembly":"System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e","method":"System.Reflection.MethodBaseInvoker.InvokeWithNoArgs","level":4,"line":0}]},{"severityLevel":"Error","outerId":"53313168","message":"Failure while attempting to promote transaction.","type":"System.Transactions.TransactionPromotionException","id":"58585774","parsedStack":[{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.SqlClient.SqlDelegatedTransaction.Promote","level":0,"line":0},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.TransactionStatePSPEOperation.PSPEPromote","level":1,"line":4423,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/TransactionState.cs"},{"assembly":"System.Transactions.Local, Version=8.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51","method":"System.Transactions.TransactionStateDelegatedNonMSDTC.EnterState","level":2,"line":4264,"fileName":"/_/src/runtime/artifacts/source-build/self/src/src/libraries/System.Transactions.Local/src/System/Transactions/TransactionState.cs"}]},{"severityLevel":"Error","outerId":"58585774","message":"The requested operation cannot be completed because the connection has been broken.","type":"System.InvalidOperationException","id":"40702182","parsedStack":[{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransaction","level":0,"line":0},{"assembly":"Microsoft.Data.SqlClient, Version=3.0.0.0, Culture=neutral, PublicKeyToken=23ec7fc2d6eaa4a5","method":"Microsoft.Data.SqlClient.SqlDelegatedTransaction.Promote","level":1,"line":0}]}]

    We also tried to migrate to 5.1.x/5.2.x, but I think that these issues were much more common (~30 times per day); Of course all these metrics might be influenced by actual usage (we tried the migration in various environments which are more or less similar in terms of infrastructure, etc);

Based on my understanding this PR: https://github.com/dotnet/SqlClient/pull/1801 (3.1.x: https://github.com/dotnet/SqlClient/pull/1912) introduced a race condition between connection.CleanupConnectionOnTransactionCompletion and enlistment.Committed (which was moved outside of connection lock) as after the initial cleanup, the connection will be returned to the default pool, enabling other threads to utilize it, while the subsequent cleanup will identify another ongoing transaction.

I saw that 5.2.0-preview5.24024.3 is up for grabs and we will give it a try these days (max. early next week);

I wanted to ask if there are plans to backport https://github.com/dotnet/SqlClient/pull/2301 to older versions like 3.1.x. @DavoudEshtehari Do you know something about it at this moment, or is it too early?

Setup: EF 6 / SqlClient 3.1.5 or SqlClient 5.1.x / 5.2.x .NET 8 Not using distributed transaction Not using MARS Using connection pooling Timeout 30 seconds Azure SQL Database

DavoudEshtehari commented 8 months ago

Hi @razvalex, This fix is slated to be considered for backporting to all the supported servicing versions, with the initial release expected to be 5.1.5. However, there is no specific timeline set for these releases as of now.

razvalex commented 8 months ago

@DavoudEshtehari, I wanted to inform you that we evaluated the performance of https://github.com/dotnet/SqlClient/releases/tag/v5.2.0-preview5 in one of our environments, which handles approximately 2 million daily requests and tens of thousands of background tasks. The results were very positive. Thanks again!

For us it would be helpful to also have https://github.com/dotnet/SqlClient/pull/2301 backported to 3.1.x, but I think we can also manage to update everything to 5.1.5 if that won't be the case.

ErikEJ commented 8 months ago

@razvalex 5.1.5 has been released

razvalex commented 8 months ago

@ErikEJ Thank you. This week, we're aiming to upgrade the majority of our projects to version 5.1.5. I'll provide you with an update on the outcomes soon (mid-February). Backporting https://github.com/dotnet/SqlClient/pull/2301 to version 3.1.x would help us (for example) to avoid extra handling for Encrypt (https://github.com/dotnet/SqlClient/pull/1210) in certain older projects.

JRahnama commented 8 months ago

@razvalex eventually the fix will be backported to 3.1, date yet to be determined.

DavoudEshtehari commented 8 months ago

@razvalex Thanks for sharing the result. It sounds promising. I understand the challenges of upgrading with breaking changes, and I encourage you to consider moving on to the latest stable version sooner rather than later to gain the benefits of the improvements.

razvalex commented 7 months ago

We were able to upgrade the majority of our projects from 3.1.5 to version 5.1.5 with great success; This week we are planning to upgrade from 5.1.5 to 5.2.0, mainly due to increased performance in some key areas (ex: https://github.com/dotnet/SqlClient/pull/1544).

I was wondering if there are any plans for backporting https://github.com/dotnet/SqlClient/pull/2301 to 3.1.x as some of our older projects might benefit having this fix in place. Also, upgrading them to a newer version means dealing with breaking changes (ex: Encrypt) and might be riskier.