dotnet / SqlClient

Microsoft.Data.SqlClient provides database connectivity to SQL Server for .NET applications.
MIT License
847 stars 282 forks source link

Connection gets stuck in SNITCPHandle.TryConnectParallel #2192

Open sksk571 opened 1 year ago

sksk571 commented 1 year ago

Describe the bug

SqlConnection.Open gets stuck in SNITCPHandle.TryConnectParallel and times out when there is no available threads in thread pool.

Exception message:

Microsoft.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 40 - Could not open a connection to SQL Server: Could not open a connection to SQL Server)

Stack trace:

   at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParser.Connect(ServerInfo serverInfo, SqlInternalConnectionTds connHandler, Boolean ignoreSniOpenTimeout, Int64 timerExpire, SqlConnectionString connectionOptions, Boolean withFailover)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.AttemptOneLogin(ServerInfo serverInfo, String newPassword, SecureString newSecurePassword, Boolean ignoreSniOpenTimeout, TimeoutTimer timeout, Boolean withFailover)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.LoginNoFailover(ServerInfo serverInfo, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance, SqlConnectionString connectionOptions, SqlCredential credential, TimeoutTimer timeout)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.OpenLoginEnlist(TimeoutTimer timeout, SqlConnectionString connectionOptions, SqlCredential credential, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds..ctor(DbConnectionPoolIdentity identity, SqlConnectionString connectionOptions, SqlCredential credential, Object providerInfo, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance, SqlConnectionString userConnectionOptions, SessionData reconnectSessionData, Boolean applyTransientFaultHandling, String accessToken, DbConnectionPool pool)
   at Microsoft.Data.SqlClient.SqlConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningConnection, DbConnectionOptions userOptions)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection(DbConnectionPool pool, DbConnection owningObject, DbConnectionOptions options, DbConnectionPoolKey poolKey, DbConnectionOptions userOptions)
   at Microsoft.Data.ProviderBase.DbConnectionPool.CreateObject(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.UserCreateRequest(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry, SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.Open(SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.Open()

We have an ASP.NET HTTP application running in Alpine Linux in K8S. The application makes synchronous SQL requests to the database. During the peak load we sometimes experience a cascade of connection failures.

Dump files collected during the incident contain a number of threads waiting in SNITCPHandle.TryConnectParallel. This method uses sync-over-async to connect to multiple IP addresses in parallel. This, coupled with the thread pool exhaustion caused by a big amount of incoming requests, may be the reason for timeouts.

Stack trace from the dump file:

System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken)
System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken)
System.Threading.Tasks.Task.InternalWaitCore(Int32, System.Threading.CancellationToken)
System.Threading.Tasks.Task.Wait(Int32, System.Threading.CancellationToken)
System.Threading.Tasks.Task.Wait(System.TimeSpan)
Microsoft.Data.SqlClient.SNI.SNITCPHandle.TryConnectParallel(System.String, Int32, System.TimeSpan, Boolean, Boolean ByRef, System.String, Microsoft.Data.SqlClient.SQLDNSInfo ByRef)
Microsoft.Data.SqlClient.SNI.SNITCPHandle..ctor(System.String, Int32, Int64, Boolean, Microsoft.Data.SqlClient.SqlConnectionIPAddressPreference, System.String, Microsoft.Data.SqlClient.SQLDNSInfo ByRef)
Microsoft.Data.SqlClient.SNI.SNIProxy.CreateTcpHandle(Microsoft.Data.SqlClient.SNI.DataSource, Int64, Boolean, Microsoft.Data.SqlClient.SqlConnectionIPAddressPreference, System.String, Microsoft.Data.SqlClient.SQLDNSInfo ByRef)
Microsoft.Data.SqlClient.SNI.SNIProxy.CreateConnectionHandle(System.String, Boolean, Int64, Byte[] ByRef, Byte[][] ByRef, Boolean, Boolean, Boolean, Boolean, Microsoft.Data.SqlClient.SqlConnectionIPAddressPreference, System.String, Microsoft.Data.SqlClient.SQLDNSInfo ByRef)
Microsoft.Data.SqlClient.SNI.TdsParserStateObjectManaged.CreatePhysicalSNIHandle(System.String, Boolean, Int64, Byte[] ByRef, Byte[][] ByRef, Boolean, Boolean, Boolean, Microsoft.Data.SqlClient.SqlConnectionIPAddressPreference, System.String, Microsoft.Data.SqlClient.SQLDNSInfo ByRef, Boolean)
Microsoft.Data.SqlClient.TdsParser.Connect(Microsoft.Data.SqlClient.ServerInfo, Microsoft.Data.SqlClient.SqlInternalConnectionTds, Boolean, Int64, Boolean, Boolean, Boolean, Boolean, Microsoft.Data.SqlClient.SqlAuthenticationMethod)
Microsoft.Data.SqlClient.SqlInternalConnectionTds.AttemptOneLogin(Microsoft.Data.SqlClient.ServerInfo, System.String, System.Security.SecureString, Boolean, Microsoft.Data.ProviderBase.TimeoutTimer, Boolean)
Microsoft.Data.SqlClient.SqlInternalConnectionTds.LoginNoFailover(Microsoft.Data.SqlClient.ServerInfo, System.String, System.Security.SecureString, Boolean, Microsoft.Data.SqlClient.SqlConnectionString, Microsoft.Data.SqlClient.SqlCredential, Microsoft.Data.ProviderBase.TimeoutTimer)
Microsoft.Data.SqlClient.SqlInternalConnectionTds.OpenLoginEnlist(Microsoft.Data.ProviderBase.TimeoutTimer, Microsoft.Data.SqlClient.SqlConnectionString, Microsoft.Data.SqlClient.SqlCredential, System.String, System.Security.SecureString, Boolean)
Microsoft.Data.SqlClient.SqlInternalConnectionTds..ctor(Microsoft.Data.ProviderBase.DbConnectionPoolIdentity, Microsoft.Data.SqlClient.SqlConnectionString, Microsoft.Data.SqlClient.SqlCredential, System.Object, System.String, System.Security.SecureString, Boolean, Microsoft.Data.SqlClient.SqlConnectionString, Microsoft.Data.SqlClient.SessionData, Boolean, System.String, Microsoft.Data.ProviderBase.DbConnectionPool)
Microsoft.Data.SqlClient.SqlConnectionFactory.CreateConnection(Microsoft.Data.Common.DbConnectionOptions, Microsoft.Data.Common.DbConnectionPoolKey, System.Object, Microsoft.Data.ProviderBase.DbConnectionPool, System.Data.Common.DbConnection, Microsoft.Data.Common.DbConnectionOptions)
Microsoft.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection(Microsoft.Data.ProviderBase.DbConnectionPool, System.Data.Common.DbConnection, Microsoft.Data.Common.DbConnectionOptions, Microsoft.Data.Common.DbConnectionPoolKey, Microsoft.Data.Common.DbConnectionOptions)

To reproduce

The following code reproduces the issue in Ubuntu 22.04 running in WSL. MultiSubnetFailover=True switches SNITCPHandle to use TryConnectParallel and triggers the bug.

using Microsoft.Data.SqlClient;

// Create thread pool exhaustion condition
List<Task> tasks = new();
for (int i = 0; i < 10000; ++i)
    tasks.Add(Waiter());

Console.WriteLine("Press any key to execute query, press q to quit...");
// Run synchronous query
while (Console.ReadKey().KeyChar != 'q')
    Console.WriteLine(Version());

async Task Waiter()
{
    await Task.Delay(100);
    Task.Delay(Timeout.Infinite).Wait();
}

string Version()
{
    using var conn = new SqlConnection("Server=tcp:127.0.0.1,1433;User ID=sa;Password=Password1;Trust server certificate=True;MultiSubnetFailover=True");
    conn.Open();
    var cmd = conn.CreateCommand();
    cmd.CommandText = "SELECT @@VERSION AS Version";
    var r = cmd.ExecuteReader();
    if (r.Read())
    {
        return (string)r["Version"];
    }
    return string.Empty;
}

Expected behavior

Synchronous SqlConnection.Open should be able to connect to SQL server regardless of the current ThreadPool usage.

Further technical details

Microsoft.Data.SqlClient version: 5.1.1 .NET target: .NET6 SQL Server version: SQL Server 2022 Operating system: Alpine 3.18 in a Docker container

Additional context SQL server for the repro case was installed in Docker using the following command

docker run -e "ACCEPT_EULA=Y" -e "MSSQL_SA_PASSWORD=Password1" -p 1433:1433 -d mcr.microsoft.com/mssql/server:2022-latest
JRahnama commented 1 year ago

@sksk571 this seems similar to issue #422 on the root cause. I can suggest to use MaxPoolsize of 20 or limiting available threads on the application.

Correction: Try increasing the minimum available threads.

sksk571 commented 7 months ago

What I find strange in this scenario is why SNITCPHandle uses TryConnectParallel even when server hostname resolves into just one IP. Can synchronous Connect be used in this case?

David-Engel commented 7 months ago

What I find strange in this scenario is why SNITCPHandle uses TryConnectParallel even when server hostname resolves into just one IP. Can synchronous Connect be used in this case?

Your connection string is specifying MultiSubnetFailover=True yet also specifying a single IP address (Server=tcp:127.0.0.1,1433). Since there aren't multiple IPs to try, MultiSubnetFailover isn't needed. Turn it off so that the TryConnectParallel path isn't used.

sksk571 commented 6 months ago

This is just an example to reproduce the bug. In production we use multi subnet failover as a part of our DR strategy and the server hostname resolves into multiple IPs there.

@JRahnama increasing MinThreads didn't work in a linked issue why do you suggest to try it here? As I understand, increasing MinThreads has a negative performance impact because threads are created more often.

adc-cjewett commented 6 months ago

I believe we're running into an issue similar to this and our usage of MultiSubnetFailover with a server hostname that resolves to multiple IPs matches what @sksk571 does. Seems to only happen on Linux as well.

Are there other options to resolve this issue?

Message: Unobserved task exception
Exception: System.AggregateException: A Task's exception(s) were not observed either by Waiting on the Task or accessing its Exception property. As a result, the unobserved exception was rethrown by the finalizer thread. (Connection timed out)  ---> System.Net.Sockets.SocketException (110): Connection timed out
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
   at System.Threading.Tasks.ValueTask.ValueTaskSourceAsTask.<>c.<.cctor>b__4_0(Object state)
--- End of stack trace from previous location ---
   at System.Data.SqlClient.SNI.SNITCPHandle.ParallelConnectHelper(Socket socket, Task connectTask, TaskCompletionSource`1 tcs, StrongBox`1 pendingCompleteCount, StrongBox`1 lastError, List`1 sockets)
   --- End of inner exception stack trace ---