dotnet / SqlClient

Microsoft.Data.SqlClient provides database connectivity to SQL Server for .NET applications.
MIT License
817 stars 271 forks source link

Errors after upgrading to 5.2.0 from 5.1.5 on Linux #2378

Open alex-jitbit opened 4 months ago

alex-jitbit commented 4 months ago

After upgrading 5.1.5 to 5.2.0 on Linux (Ubuntu) .NET 8 I'm getting thousands of errors:

Exception message:

Microsoft.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 35 - An internal exception was caught)
 ---> System.TimeoutException: The socket couldn't connect during the expected 14965 remaining time.

Stack trace:

Microsoft.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 35 - An internal exception was caught)
 ---> System.TimeoutException: The socket couldn't connect during the expected 14965 remaining time.
   at Microsoft.Data.SqlClient.SNI.SNITCPHandle.Connect(String serverName, Int32 port, TimeoutTimer timeout, SqlConnectionIPAddressPreference ipPreference, String cachedFQDN, SQLDNSInfo& pendingDNSInfo)
   at Microsoft.Data.SqlClient.SNI.SNITCPHandle..ctor(String serverName, Int32 port, TimeoutTimer timeout, Boolean parallel, SqlConnectionIPAddressPreference ipPreference, String cachedFQDN, SQLDNSInfo& pendingDNSInfo, Boolean tlsFirst, String hostNameInCertificate, String serverCertificateFilename)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry, SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.Open(SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.Open()

Some requests work fine, but about 50% throw this error. Reverting back to 5.1.5 solves the problem.

Further technical details

Microsoft.Data.SqlClient version: 5.2 .NET target: NET 8 SQL Server version: SQL 2017 on Linux Operating system: Ubuntu 22

JRahnama commented 4 months ago

@alex-jitbit is this happening on a regular connection? I mean there is no AAD included? can you provide a sample repro please?

JRahnama commented 4 months ago

Linux uses managed SNI and I think the improvements were done mostly on the native side, which is windows only. Which change did you mean?

alex-jitbit commented 4 months ago

No, no AAD, my connection string uses explicit username/password combo

Data Source=172.0.0.123,1433;Initial Catalog=database;user id=user;pwd=PaSsWoRd;Max Pool Size=250;Encrypt=false

A simple repro would be:

var cn = new SqlConnection(connectionString);
cn.Open();

Compile on .NET 8, run on Ubuntu 22.04 (AWS) connecting to external SQL Server (also Ubuntu 22 on AWS).

Reverting to 5.1.5 fixed the problem immediately.

P.S. Can't repro on WSL Ubuntu connecting to Windows-hosted MS SQL Server, I assume the issue is with connecting to a linux-hosted SQL Server OR it happens under heavy load only.

JRahnama commented 4 months ago

@alex-jitbit I will test it today and will update you after.

guiestimoneon commented 4 months ago

I had the same problem, I downgrade to 5.1.5 and it worked again

SQL 2019 - Windows Server - .NET 8

JRahnama commented 4 months ago

I was not able to repro the issue on Ubuntu 22.04 as a local server, but I will test it with a remote server. If there is any issue it should be related to https://github.com/dotnet/SqlClient/pull/1029

Update: I tested with an azure SQL server at East US (adding more latency), but was not able to repro the issue.

JRahnama commented 4 months ago

here is my test setup:

csproj:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net8.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>
  <ItemGroup>
    <PackageReference Include="Microsoft.Data.SqlClient" Version="5.2.0" />
  </ItemGroup>
</Project>

Program.cs

using Microsoft.Data.SqlClient;

SqlConnectionStringBuilder builder = new(){
    DataSource ="*******.database.windows.net",
    UserID = "******",
    Password = "*****",
    InitialCatalog = "Northwind",
    MaxPoolSize = 250
};
using SqlConnection conn = new(builder.ConnectionString);
conn.Open();
Console.WriteLine(conn.State);

I will test with a remote on premises server later today.

ctrlaltdan commented 4 months ago

Also seeing the same. Downgrading to 5.2.0-preview5.24024.3 resolved the issue for me.

Additional debug info if it's helpful.

Framework: .NET 8.0.2 
Runtime: linux-musl-x64
Image: Alpine Linux v3.19
Using: Microsoft.EntityFrameworkCore.SqlServer:8.0.2
Connected using an Azure SQL Failover group, via Entity Framework. 
David-Engel commented 4 months ago

If you are on Linux/macOS and specify both port and instance name in the connection string (like server,12345\instance), that might be the source issue. There appears to have been a regression in 5.2.0 on non-Windows where it isn't ignoring the instance name when both it and the port are specified.

sturledahl commented 3 months ago

Some obervation from me hoping it helps investigation:

We started getting this problem in alpine for a test that starts multiple threads connecting some same database in parallel. Other tests work fine. Our connection strings do not specify instance or port. Tried adding some delays in the code inside the different tasks to affect timing and then we got a different error instead of the one mentioned in this ticket:

System.InvalidOperationException: Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached. 12:25:02  at Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)

Here's how threads are created in the test: var tasks = schedulers.Select(s => new TaskFactory().StartNew(s.Start)).ToList(); foreach (var t in tasks) t.Wait();

Reverting back to v5.1.5 fixed this, so we are not updating to v5.2.0 until we know more.

alex-jitbit commented 3 months ago

I can confirm that we too experience this error under a heavy load with multiple threads (not sure if this is the culprit)

JRahnama commented 3 months ago

can you guys test with this package and see if the issue is resolved? just change the extension to nupkg and should be good for testing. Microsoft.Data.SqlClient.6.0.0-pull.106802.zip

sturledahl commented 3 months ago

@JRahnama Any chance you could add it to nuget.org so our build/test system can find it?

JRahnama commented 3 months ago

@sturledahl this package is not officially signed and is not suitable for production use. I just wanted to confirm that the fix has resolved the issue for users before proceeding with a hotfix release.

PaulVrugt commented 3 months ago

any update on this?

JRahnama commented 3 months ago

any update on this?

Were you able to test with the sample package?

can you guys test with this package and see if the issue is resolved? just change the extension to nupkg and should be good for testing. Microsoft.Data.SqlClient.6.0.0-pull.106802.zip

PaulVrugt commented 3 months ago

@JRahnama well no. We haven't updated to 5.2 yet because of this issue. We are currently using version 5.1.5 and running into https://github.com/dotnet/SqlClient/issues/449, but it is only happening in our production environment (with a lot of traffic) and even there only once every few weeks. We have no controlled environment to test this. Maybe @alex-jitbit has a way to reproduce it and see if the 6.0.0 version resolves it

mosesnnewman commented 3 months ago

Same issue on Windows!

alex-jitbit commented 2 months ago

Maybe @alex-jitbit has a way to reproduce it and see if the 6.0.0 version resolves it

Unfortunately this bug reproducible in production only (under high load) and frankly I'm too afraid to try beta fixes on my prod.

JRahnama commented 2 months ago

@alex-jitbit is it possible to test with 5.2.0-preview2 and 5.2.0-preview5 versions to identify what changed caused the issue?

ABAG603 commented 2 months ago

We have same issue as people above after upgrading to the 5.2.0. Issue is reproducible from either Alpine containers or VM with Amazon Linux 2023. SQL Server is running on Windows Server and connection string contains named instance and port.

@JRahnama I've tested with different versions and here's the outcome:

Microsoft.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct
and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 26 - Error Locating Server/Instance Specified)
     System.Net.Sockets.SocketException: Success
       at int Microsoft.Data.SqlClient.SNI.SSRP.GetPortByInstanceName(string browserHostName, string instanceName, TimeoutTimer timeout, bool allIPsInParallel, SqlConnectionIPAddressPreference ipPreference)
       at SNITCPHandle Microsoft.Data.SqlClient.SNI.SNIProxy.CreateTcpHandle(DataSource details, TimeoutTimer timeout, bool parallel, SqlConnectionIPAddressPreference ipPreference, string cachedFQDN, ref SQLDNSInfo pendingDNSInfo,
          bool tlsFirst, string hostNameInCertificate, string serverCertificateFilename)
  at void Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, bool breakConnection, Action<Action> wrapCloseInAction)
  at void Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, SqlCommand command, bool callerHasConnectionLock, bool asyncClose)
  at void Microsoft.Data.SqlClient.TdsParser.Connect(ServerInfo serverInfo, SqlInternalConnectionTds connHandler, TimeoutTimer timeout, SqlConnectionString connectionOptions, bool withFailover)
  at void Microsoft.Data.SqlClient.SqlInternalConnectionTds.AttemptOneLogin(ServerInfo serverInfo, string newPassword, SecureString newSecurePassword, TimeoutTimer timeout, bool withFailover)
  at void Microsoft.Data.SqlClient.SqlInternalConnectionTds.LoginNoFailover(ServerInfo serverInfo, string newPassword, SecureString newSecurePassword, bool redirectedUserInstance, SqlConnectionString connectionOptions,
     SqlCredential credential, TimeoutTimer timeout)
  at void Microsoft.Data.SqlClient.SqlInternalConnectionTds.OpenLoginEnlist(TimeoutTimer timeout, SqlConnectionString connectionOptions, SqlCredential credential, string newPassword, SecureString newSecurePassword, bool
     redirectedUserInstance)
  at Microsoft.Data.SqlClient.SqlInternalConnectionTds..ctor(DbConnectionPoolIdentity identity, SqlConnectionString connectionOptions, SqlCredential credential, object providerInfo, string newPassword, SecureString
     newSecurePassword, bool redirectedUserInstance, SqlConnectionString userConnectionOptions, SessionData reconnectSessionData, bool applyTransientFaultHandling, string accessToken, DbConnectionPool pool,
     Func<SqlAuthenticationParameters, CancellationToken, Task<SqlAuthenticationToken>> accessTokenCallback)
  at DbConnectionInternal Microsoft.Data.SqlClient.SqlConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningConnection,
     DbConnectionOptions userOptions)
  at DbConnectionInternal Microsoft.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection(DbConnectionPool pool, DbConnection owningObject, DbConnectionOptions options, DbConnectionPoolKey poolKey, DbConnectionOptions
     userOptions)
  at DbConnectionInternal Microsoft.Data.ProviderBase.DbConnectionPool.CreateObject(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
  at DbConnectionInternal Microsoft.Data.ProviderBase.DbConnectionPool.UserCreateRequest(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
  at bool Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, uint waitForMultipleObjectsTimeout, bool allowCreate, bool onlyOneCheckConnection, DbConnectionOptions userOptions, out
     DbConnectionInternal connection)
  at bool Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource<DbConnectionInternal> retry, DbConnectionOptions userOptions, out DbConnectionInternal connection)
  at bool Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource<DbConnectionInternal> retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, out
     DbConnectionInternal connection)
  at bool Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource<DbConnectionInternal> retry, DbConnectionOptions
     userOptions)
  at bool Microsoft.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource<DbConnectionInternal> retry, DbConnectionOptions userOptions)
  at bool Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource<DbConnectionInternal> retry, SqlConnectionOverrides overrides)
  at void Microsoft.Data.SqlClient.SqlConnection.Open(SqlConnectionOverrides overrides)
  at void Microsoft.Data.SqlClient.SqlConnection.Open()
  at void Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerConnection.OpenDbConnection(bool errorsExpected)
  at void Microsoft.EntityFrameworkCore.Storage.RelationalConnection.OpenInternal(bool errorsExpected)
  at bool Microsoft.EntityFrameworkCore.Storage.RelationalConnection.Open(bool errorsExpected)
  at bool Microsoft.EntityFrameworkCore.RelationalDatabaseFacadeExtensions.<>c.<OpenConnection>b__22_0(DatabaseFacade database)
  at TResult Microsoft.EntityFrameworkCore.ExecutionStrategyExtensions.<>c__DisplayClass12_0`2.<Execute>b__0(DbContext _, TState s)
  at TResult Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.Execute<TState,TResult>(TState state, Func<DbContext, TState, TResult> operation, Func<DbContext, TState, ExecutionResult<TResult>>
     verifySucceeded)
  at TResult Microsoft.EntityFrameworkCore.ExecutionStrategyExtensions.Execute<TState,TResult>(IExecutionStrategy strategy, TState state, Func<TState, TResult> operation, Func<TState, ExecutionResult<TResult>> verifySucceeded)
  at void Microsoft.EntityFrameworkCore.RelationalDatabaseFacadeExtensions.OpenConnection(DatabaseFacade databaseFacade)
  at int TestConnectionCommand.Execute(CommandContext context)
JRahnama commented 2 months ago

@ABAG603 Fix is merged in the main branch by PR #2395. Hotfix release v5.2.1 is planned, but date yet TBD.

Closing the issue as fix is merged and will be available by next hotfix release.

hssamany commented 2 months ago

Unfortunatly, I couldn't wait till release, But I found out that, one other cause of this error can be due to the compatibility level of your sql-server. For example Linq-Queries with a collection filter like so query.Where(p => (new List<string> { "XX","YY"})).Contains(p.MyCode)); resulted into an SQL like:


SELECT [t].[Id]
WHERE [t].[MyCode] IN (
    SELECT [c].[value]
    FROM OPENJSON(@__codes_0) WITH ([value] nvarchar(50) '$') AS [c]
)
ORDER BY [t].[Id]
```.

Then I had to raise my Database Compatibility level to "150" in order to run the "OPENJSON"-function. But I assume it should also work under level "130".
PaulVrugt commented 1 month ago

@JRahnama any update on a release date for 5.2.1 yet? It's been a couple of weeks now and still no hotfix

alex-jitbit commented 1 month ago

NOT FIXED attn @JRahnama

The issue is not fixed in 5.2.1

I'm getting the same error under high load:

Microsoft.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 35 - An internal exception was caught)
 ---> System.TimeoutException: The socket couldn't connect during the expected 14999 remaining time.
   at Microsoft.Data.SqlClient.SNI.SNITCPHandle.Connect(String serverName, Int32 port, TimeoutTimer timeout, SqlConnectionIPAddressPreference ipPreference, String cachedFQDN, SQLDNSInfo& pendingDNSInfo)
   at Microsoft.Data.SqlClient.SNI.SNITCPHandle..ctor(String serverName, Int32 port, TimeoutTimer timeout, Boolean parallel, SqlConnectionIPAddressPreference ipPreference, String cachedFQDN, SQLDNSInfo& pendingDNSInfo, Boolean tlsFirst, String hostNameInCertificate, String serverCertificateFilename)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry, SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.Open(SqlConnectionOverrides overrides)

Reverting to 5.1.5 fixes the issue

alex-jitbit commented 1 month ago

Please reopen the issue

JRahnama commented 3 weeks ago

@alex-jitbit Can you confirm that the issue is not happening with 5.2.0-preview5.24024.3? I saw some other users have confirmed it and that was regarding a regression we addressed in 5.2.1. Seems like in your case is a bit different.

JRahnama commented 3 weeks ago

Also we are going to need a simple repro application for further investigation.

David-Engel commented 3 weeks ago

@alex-jitbit Does increasing the connect timeout in your connection string solve the issue? The reason I ask is that #2098 in 5.2-preview improved respecting of connection timeout during connect on the exact path your exception is occurring and the message indicates the default timeout of 15 seconds has elapsed when the exception is thrown. I'm wondering if 5.1 was simply taking longer than the connect timeout to connect under load but succeeding anyway.

alex-jitbit commented 3 weeks ago

@JRahnama repro here: https://gist.github.com/alex-jitbit/1eca9a1f014e036691bdc35cd852c726 the bug is even reproducable when running on Windows under WSL2 however the error is slightly different in that case (see repro description)

apxltd commented 3 weeks ago

I can confirm that this is indeed broken in 5.2.1 as well, and actually worse than 5.2.0. This is trivial to reproduce, basically do exactly what @alex-jitbit -- just open a bunch of connections to simulate what a real application experiencing traffic would do.

In 5.2.0, SqlClient crashes immediately with that same. When we use 5.2.1, there seems to be some kind of timeout mechanism that makes the cashes take forever, and effectively freezes up the whole application.

Stack trace seems to be the same:

Microsoft.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 35 - An internal exception was caught)
---> System.TimeoutException: The socket couldn't connect during the expected 14990 remaining time.
at Microsoft.Data.SqlClient.SNI.SNITCPHandle.Connect(String serverName, Int32 port, TimeoutTimer timeout, SqlConnectionIPAddressPreference ipPreference, String cachedFQDN, SQLDNSInfo& pendingDNSInfo)
at Microsoft.Data.SqlClient.SNI.SNITCPHandle..ctor(String serverName, Int32 port, TimeoutTimer timeout, Boolean parallel, SqlConnectionIPAddressPreference ipPreference, String cachedFQDN, SQLDNSInfo& pendingDNSInfo, Boolean tlsFirst, String hostNameInCertificate, String serverCertificateFilename)
at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)
at Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
at Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
at Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry, SqlConnectionOverrides overrides)
at Microsoft.Data.SqlClient.SqlConnection.InternalOpenAsync(CancellationToken cancellationToken)