dotnet / SqlClient

Microsoft.Data.SqlClient provides database connectivity to SQL Server for .NET applications.
MIT License
857 stars 286 forks source link

`NullReferenceException` from `SqlInternalTransaction.GetServerTransactionLevel()` #2879

Closed MaceWindu closed 3 weeks ago

MaceWindu commented 1 month ago

Describe the bug

We had an incident with database which resulted in several errors from server. Unfortunatelly I don't have details on exact reason for it, could be low memory conditions as I see RADAR_PRE_LEAK_64 error in event logs for one of our apps. But one of errors was NRE from SqlClient code with following stack traces:

Exception message: Object reference not set to an instance of an object.
Stack trace:
   at Microsoft.Data.SqlClient.SqlInternalTransaction.GetServerTransactionLevel()
   at Microsoft.Data.SqlClient.SqlInternalTransaction.CheckTransactionLevelAndZombie()
   at Microsoft.Data.SqlClient.SqlInternalTransaction.Commit()
   at Microsoft.Data.SqlClient.SqlTransaction.Commit()
   at System.Data.Common.DbTransaction.CommitAsync(CancellationToken cancellationToken)
Exception message: Object reference not set to an instance of an object.
Stack trace:
   at Microsoft.Data.SqlClient.SqlInternalTransaction.GetServerTransactionLevel()
   at Microsoft.Data.SqlClient.SqlInternalTransaction.CheckTransactionLevelAndZombie()
   at Microsoft.Data.SqlClient.SqlInternalTransaction.Rollback()

To reproduce

No idea, but other errors we seen were:

SqlException: The server failed to resume the transaction. Desc: xxx. The transaction active in this session has been committed or aborted by another session.

and

SqlException: New request is not allowed to start because it should come with valid transaction descriptor.

last one is actually incorrect according to documentation as we don't use distributed transaction. Maybe related to https://github.com/dotnet/SqlClient/issues/2683 but not sure as we don't use encryption.

Expected behavior

Dunno, server error handling in a way it doesn't throw NRE.

Further technical details

Microsoft.Data.SqlClient version: 5.1.4 (a bit old, but I don't see any functional changes to SqlInternalTransaction code in newer versions) .NET target: net8.0 SQL Server version/Operating system: Microsoft SQL Server 2019 (RTM-CU28-GDR) (KB5042749) - 15.0.4390.2 (X64) Aug 12 2024 13:08:42 Copyright (C) 2019 Microsoft Corporation Web Edition (64-bit) on Windows Server 2019 Datacenter 10.0 <X64> (Build 17763: ) (Hypervisor)

Additional context

There was report with same error before https://github.com/dotnet/SqlClient/issues/1121

arellegue commented 1 month ago

Do you have a repro you can share? This would help us reproduce the issue and investigate the root cause.

Could you please try to use MDS 5.2.2 to see if the issue still persists.

MaceWindu commented 1 month ago

Currently I don't have an idea how to reproduce it. Looking at code I suspect it could be a race condition

between CheckTransactionLevelAndZombie() call from failed transaction operation (commit/rollback/save)

https://github.com/dotnet/SqlClient/blob/main/src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlInternalTransaction.cs#L168

and CloseFromConnection() call from TdsParser

https://github.com/dotnet/SqlClient/blob/main/src/Microsoft.Data.SqlClient/src/Microsoft/Data/SqlClient/SqlInternalTransaction.cs#L213

// 
// thread 1: !IsZombied checks that _innerConnection is set
// thread 2: Zombie cleans _innerConnection
// thread 1: access null _innerConnection in GetServerTransactionLevel() call

Probably when one of those two other errors I mentioned in initial report occur - it could trigger race condition, but I have no idea how to force such errors from server to test this. Tried to kill sessions on server, but looks like it is not enough. I see both CheckTransactionLevelAndZombie and CloseFromConnection called, but wasn't able to trigger race conditions.

I will see if I can come up with more ideas how to reproduce it

MaceWindu commented 3 weeks ago

Closing for now, No idea how to reproduce it. Will try to investigate again if it will happen again