dotnet / SqlClient

Microsoft.Data.SqlClient provides database connectivity to SQL Server for .NET applications.
MIT License
817 stars 271 forks source link

SQLClient - OpenAsync() API crashes on Azure web application #1927

Open praveensri opened 1 year ago

praveensri commented 1 year ago

Describe the bug

OpenAsync() api crashes - when running the code on Azure websites.

Exception message:
Stack trace:
ExitCodeString STATUS_STACK_BUFFER_OVERRUN
DefaultHostName XXXXXXXXXXXXXXXX
CallStack - Crashing Thread
========================================================
     InlinedCallFrame
     ILStubClass.IL_STUB_PInvoke(Boolean, IntPtr)
     InlinedCallFrame
     Microsoft.Data.SqlClient.SNILoadHandle..ctor()
     Microsoft.Data.SqlClient.SNILoadHandle..cctor()
     HelperMethodFrame
     Microsoft.Data.SqlClient.TdsParserStateObjectFactory.get_EncryptionOptions()
     Microsoft.Data.SqlClient.TdsParser..cctor()
     HelperMethodFrame
     Microsoft.Data.SqlClient.TdsParser..ctor(Boolean, Boolean)
     Microsoft.Data.SqlClient.SqlInternalConnectionTds.LoginNoFailover(Microsoft.Data.SqlClient.ServerInfo, System.String, System.Security.SecureString, Boolean, Microsoft.Data.SqlClient.SqlConnectionString, Microsoft.Data.SqlClient.SqlCredential, Microsoft.Data.ProviderBase.TimeoutTimer)
     Microsoft.Data.SqlClient.SqlInternalConnectionTds.OpenLoginEnlist(Microsoft.Data.ProviderBase.TimeoutTimer, Microsoft.Data.SqlClient.SqlConnectionString, Microsoft.Data.SqlClient.SqlCredential, System.String, System.Security.SecureString, Boolean)
     Microsoft.Data.SqlClient.SqlInternalConnectionTds..ctor(Microsoft.Data.ProviderBase.DbConnectionPoolIdentity, Microsoft.Data.SqlClient.SqlConnectionString, Microsoft.Data.SqlClient.SqlCredential, System.Object, System.String, System.Security.SecureString, Boolean, Microsoft.Data.SqlClient.SqlConnectionString, Microsoft.Data.SqlClient.SessionData, Boolean, System.String, Microsoft.Data.ProviderBase.DbConnectionPool)
     Microsoft.Data.SqlClient.SqlConnectionFactory.CreateConnection(Microsoft.Data.Common.DbConnectionOptions, Microsoft.Data.Common.DbConnectionPoolKey, System.Object, Microsoft.Data.ProviderBase.DbConnectionPool, System.Data.Common.DbConnection, Microsoft.Data.Common.DbConnectionOptions)
     Microsoft.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection(Microsoft.Data.ProviderBase.DbConnectionPool, System.Data.Common.DbConnection, Microsoft.Data.Common.DbConnectionOptions, Microsoft.Data.Common.DbConnectionPoolKey, Microsoft.Data.Common.DbConnectionOptions)
     Microsoft.Data.ProviderBase.DbConnectionPool.CreateObject(System.Data.Common.DbConnection, Microsoft.Data.Common.DbConnectionOptions, Microsoft.Data.ProviderBase.DbConnectionInternal)
     Microsoft.Data.ProviderBase.DbConnectionPool.UserCreateRequest(System.Data.Common.DbConnection, Microsoft.Data.Common.DbConnectionOptions, Microsoft.Data.ProviderBase.DbConnectionInternal)
     Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(System.Data.Common.DbConnection, UInt32, Boolean, Boolean, Microsoft.Data.Common.DbConnectionOptions, Microsoft.Data.ProviderBase.DbConnectionInternal ByRef)
     Microsoft.Data.ProviderBase.DbConnectionPool.WaitForPendingOpen()
     System.Threading.Thread+StartHelper.Callback(System.Object)
     System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
     System.Threading.Thread.StartCallback()
     DebuggerU2MCatchHandlerFrame

To reproduce

Include a complete code listing (or project/solution) that we can run to reproduce the issue.

Partial code listings, or multiple fragments of code, will slow down our response or cause us to push the issue back to you to provide code to reproduce the issue.


  try
                {
                    await connection.OpenAsync(cancellationToken: cancellationToken).ConfigureAwait(continueOnCapturedContext: false);
                }
                catch (Exception ex) when (!ex.IsFatal())
                {
                    this.LogConnectionOpenFailure(ex);
                    connection.Dispose();
                    throw;
                }

Expected behavior

A clear and concise description of what you expected to happen.

Further technical details

Microsoft.Data.SqlClient version: <PackageReference Include="Microsoft.Data.SqlClient" Version="4.1.0" .NET target: (e.g. Framework 4.7.1, Core 2.2.2) : .Net core SQL Server version: (e.g. SQL Server 2017) Operating system: Windows

Additional context Add any other context about the problem here.

lcheunglci commented 1 year ago

Hi @praveensri , thanks for bringing this issue to our attention. What is the connection string that you are using? Also, does it crash on a sync call to the connection.Open()? It would be helpful if you can provide us with a sample repro project. Thanks!

praveensri commented 1 year ago

@lcheunglci , The Azure SQL connection string is used .It doesn't crash during Open(), also this is happens when the backend service tries to exit the thread gracefully but OpenAsync(cancellationToken) is non-responsive or crashes sometime. I do not have repro as crash is intermittent. Is there anything that we can figure out from the crash stack trace?

ErikEJ commented 1 year ago

. Is there anything that we can figure out from the crash dump stack trace

Not unless you share it!

praveensri commented 1 year ago

. Is there anything that we can figure out from the crash dump stack trace

Not unless you share it!

Already shared stack trace.

lcheunglci commented 1 year ago

What does this mean? Are you calling thread.abort, or calling cancel token?

when the backend service tries to exit the thread gracefully but OpenAsync(cancellationToken) is non-responsive or crashes sometime

Also you can enable eventsource trace log from the documentation here. If you have a core dump, that could be useful, however, it can contain sensitive information in it, so it would be recommended to be run in a dev environment and/or share it in a way that you can delete it later.

praveensri commented 1 year ago

Looks like this is case of STATUS_STACK_BUFFER_OVERRUN while creating the pooled connection as given in the stack trace. If we replace the Microsoft.Data.SqlClient with System.Data.SqlClient it doesn't crash during OpenAsync(cancellationToken)

lcheunglci commented 1 year ago

Hi @praveensri , that's an interesting observation. From our offline discussion earlier, you can try the AppContext switch for enabling the ManagedSNI on Windows and see if it makes any difference, however, the STATUS_STACK_BUFFER_OVERRUN does seem troubling and the stacktrace you shared seems like it's happening in the TdsParser constructor when it tries to create the TdsParserStateObject which branches between NativeSNI (C++) and MangedSNI (C#); however, there is a known issue in .NET Core with MDS for async reads that might result in thread starvation that may lead to the timeout 258 exception, which can be mitigated via increasing the ThreadPool.SetMinThreads; however, STATUS_STACK_BUFFER_OVERRUN might be caused by the byte buffers involved in the TdsParserStateObject and writing outside it's bounds, and as you mentioned this happens when the backend service tries gracefully exit the thread, which could mean the cancellation token is not working as expected or there's an issue with disposing a connection from the connection pool resulting in an invalid partial read/write leading to the STATUS_STACK_BUFFER_OVERRUN. Could also provide the connection string?

NiklasEMS commented 1 year ago

Could it be AES variable length issue? We had intermittent similar SQL connectivity issue in Windows 2008 oledbsql driver when AES encryption was used in SQL connections. The code was assuming fixed key (MD5?/other?) length, but very rarely the AES variable length surpassed this fixed buffer size. This was fixed in Windows 2008 private fix and publicly in Windows 2008 R2. Could it be similar here? Is there some tracing which can help find out more?