dotnet / SqlClient

Microsoft.Data.SqlClient provides database connectivity to SQL Server for .NET applications.
MIT License
846 stars 281 forks source link

Concurrent connection usage and MARS #1234

Closed roji closed 3 years ago

roji commented 3 years ago

As part of running some compatibility checks between EF Core 6.0 and SqlClient 3.0.0 and 4.0.0-preview1.21237.2 (https://github.com/dotnet/efcore/issues/25766), I came across what looks like a considerable perf regression. While the EF test suite completed in around 4m30s with SqlClient 3.0.0, it took around 12 minutes with 4.0.0-preview1.21237.2.

Some digging seems to point at test QueryBugsTest.Thread_safety_in_relational_command_cache; with 3.0.0 the total run completes in 20 seconds (including all startup), whereas with 4.0.0-preview1.21237.2 it takes around 5m15s. This test runs 100 parallel queries - no big data is being transferred, and purely sync I/O is used.

To run this, you can check out the EF Core code base, run build.cmd to get the latest dotnet core, go inside test/EFCore.SqlServer.FunctionalTests and run dotnet test --filter Thread_safety_in_relational_command_cache

Wraith2 commented 3 years ago

Windows or Linux will be an important factor.

roji commented 3 years ago

Good point, this happened on my machine running on Ubuntu 21.04.

ajcvickers commented 3 years ago

I will do a test run on my Windows laptop.

ajcvickers commented 3 years ago

I can't get any of the SQL Server tests to run on my machine using 4.0.0-preview1.21237.2. I get:

System.ComponentModel.Win32Exception : The certificate chain was issued by an authority that is not trusted.

Is there something else I need to configure on my machine to make these tests run?

Full output:

Test run for C:\dotnet\efcore\artifacts\bin\EFCore.SqlServer.FunctionalTests\Debug\net6.0\Microsoft.EntityFrameworkCore.SqlServer.FunctionalTests.dll (.NETCoreApp,Version=v6.0)
Microsoft (R) Test Execution Command Line Tool Version 17.0.0-preview-20210817-02
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
A total of 1 test files matched the specified pattern.
[xUnit.net 00:00:09.14]     Microsoft.EntityFrameworkCore.Query.QueryBugsTest.Thread_safety_in_relational_command_cache [FAIL]
  Failed Microsoft.EntityFrameworkCore.Query.QueryBugsTest.Thread_safety_in_relational_command_cache [346 ms]
  Error Message:
   Microsoft.Data.SqlClient.SqlException : A connection was successfully established with the server, but then an error occurred during the login process. (provider: SSL Provider, error: 0 - The certificate chain was issued by an authority that is not trusted.)
---- System.ComponentModel.Win32Exception : The certificate chain was issued by an authority that is not trusted.
  Stack Trace:
     at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParserStateObject.ThrowExceptionAndWarning(Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParserStateObject.SNIWritePacket(PacketHandle packet, UInt32& sniError, Boolean canAccumulate, Boolean callerHasConnectionLock)
   at Microsoft.Data.SqlClient.TdsParserStateObject.WriteSni(Boolean canAccumulate)
   at Microsoft.Data.SqlClient.TdsParserStateObject.WritePacket(Byte flushMode, Boolean canAccumulate)
   at Microsoft.Data.SqlClient.TdsParser.TdsLogin(SqlLogin rec, FeatureExtension requestedFeatures, SessionData recoverySessionData, FederatedAuthenticationFeatureExtensionData fedAuthFeatureExtensionData)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.Login(ServerInfo server, TimeoutTimer timeout, String newPassword, SecureString newSecurePassword)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.AttemptOneLogin(ServerInfo serverInfo, String newPassword, SecureString newSecurePassword, Boolean ignoreSniOpenTimeout, TimeoutTimer timeout, Boolean withFailover)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.LoginNoFailover(ServerInfo serverInfo, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance, SqlConnectionString connectionOptions, SqlCredential credential, TimeoutTimer timeout)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.OpenLoginEnlist(TimeoutTimer timeout, SqlConnectionString connectionOptions, SqlCredential credential, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds..ctor(DbConnectionPoolIdentity identity, SqlConnectionString connectionOptions, SqlCredential credential, Object providerInfo, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance, SqlConnectionString userConnectionOptions, SessionData reconnectSessionData, Boolean applyTransientFaultHandling, String accessToken, DbConnectionPool pool)
   at Microsoft.Data.SqlClient.SqlConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningConnection, DbConnectionOptions userOptions)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection(DbConnectionPool pool, DbConnection owningObject, DbConnectionOptions options, DbConnectionPoolKey poolKey, DbConnectionOptions userOptions)
   at Microsoft.Data.ProviderBase.DbConnectionPool.CreateObject(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.UserCreateRequest(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry, SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.Open(SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.Open()
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.ExecuteCommand[T](DbConnection connection, Func`2 execute, String sql, Boolean useTransaction, Object[] parameters) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 330
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.<>c__37`1.<Execute>b__37_0(<>f__AnonymousType83`5 state) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 316
   at Microsoft.EntityFrameworkCore.ExecutionStrategyExtensions.<>c__DisplayClass12_0`2.<Execute>b__0(DbContext c, TState s) in C:\dotnet\efcore\src\EFCore\Storage\ExecutionStrategyExtensions.cs:line 341
   at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.<>c__DisplayClass28_0`2.<Execute>b__0(DbContext context, TState state) in C:\dotnet\efcore\src\EFCore\Storage\ExecutionStrategy.cs:line 161
   at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteImplementation[TState,TResult](Func`3 operation, Func`3 verifySucceeded, TState state) in C:\dotnet\efcore\src\EFCore\Storage\ExecutionStrategy.cs:line 176
   at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.Execute[TState,TResult](TState state, Func`3 operation, Func`3 verifySucceeded) in C:\dotnet\efcore\src\EFCore\Storage\ExecutionStrategy.cs:line 160
   at Microsoft.EntityFrameworkCore.ExecutionStrategyExtensions.Execute[TState,TResult](IExecutionStrategy strategy, TState state, Func`2 operation, Func`2 verifySucceeded) in C:\dotnet\efcore\src\EFCore\Storage\ExecutionStrategyExtensions.cs:line 339
   at Microsoft.EntityFrameworkCore.ExecutionStrategyExtensions.Execute[TState,TResult](IExecutionStrategy strategy, TState state, Func`2 operation) in C:\dotnet\efcore\src\EFCore\Storage\ExecutionStrategyExtensions.cs:line 288
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.Execute[T](DbConnection connection, Func`2 execute, String sql, Boolean useTransaction, Object[] parameters) in C:\dotnet\efcore\tes\TestUtilities\SqlServerTestStore.cs:line 247
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.CreateDatabase(Action`1 clean) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 110
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.Initialize(Func`1 createContext, Action`1 seed, Action`1 clean) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 86
.cs:line 52
   at Microsoft.EntityFrameworkCore.TestUtilities.TestStore.Initialize(IServiceProvider serviceProvider, Func`1 createContext, Action`1 seed, Action`1 clean) in C:\dotnet\efcore\test\EFCore.Specification.Tests\TestUtilities\TestStore.cs:line 45
   at Microsoft.EntityFrameworkCore.TestUtilities.RelationalTestStore.Initialize(IServiceProvider serviceProvider, Func`1 createContext, Action`1 seed, Action`1 clean) in C:\dotnet\efcore\test\EFCore.Relational.Specification.Tests\TestUtilities\RelationalTestStore.cs:line 43
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.InitializeSqlServer(IServiceProvider serviceProvider, Func`1 createContext, Action`1 seed) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 76
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.CreateInitialized(String name, Boolean useFileName, Nullable`1 multipleActiveResultSets) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 44
   at Microsoft.EntityFrameworkCore.Query.QueryBugsTest.CreateTestStore() in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\Query\QueryBugsTest.cs:line 10401
   at Microsoft.EntityFrameworkCore.NonSharedModelTestBase.Initialize[TContext](Action`1 onModelCreating, Action`1 onConfiguring, Action`1 addServices, Func`2 shouldLogCategory, Func`1 createTestStore, Boolean usePooling) in C:\dotnet\efcore\test\EFCore.Specification.Tests\NonSharedModelTestBase.cs:line 86
   at Microsoft.EntityFrameworkCore.NonSharedModelTestBase.InitializeAsync[TContext](Action`1 onModelCreating, Action`1 onConfiguring, Action`1 addServices, Action`1 seed, Func`2 shouldLogCategory, Func`1 createTestStore, Boolean usePooling) in C:\dotnet\efcore\test\EFCore.Specification.Tests\NonSharedModelTestBase.cs:line 67
   at Microsoft.EntityFrameworkCore.Query.QueryBugsTest.Thread_safety_in_relational_command_cache() in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\Query\QueryBugsTest.cs:line 7092
--- End of stack trace from previous location ---
----- Inner Stack Trace -----

Failed!  - Failed:     1, Passed:     0, Skipped:     0, Total:     1, Duration: < 1 ms - Microsoft.EntityFrameworkCore.SqlServer.FunctionalTests.dll (net6.0)
roji commented 3 years ago

Yes, you need to add Encrypt=false to your connection string; SqlClient 4.0.0 has Encrypt set to true by default (#1210).

Wraith2 commented 3 years ago

Preview 4 has a security change which forces encryption and trust. You either need to disable encryption (not recommended because it'll be what users have enabled) or add TrustServerCertificate=True; to your connection string.

ajcvickers commented 3 years ago

Not seeing a regression on Windows:

With 2.1:

Test run for C:\dotnet\efcore\artifacts\bin\EFCore.SqlServer.FunctionalTests\Debug\net6.0\Microsoft.EntityFrameworkCore.SqlServer.FunctionalTests.dll (.NETCoreApp,Version=v6.0)
Microsoft (R) Test Execution Command Line Tool Version 17.0.0-preview-20210817-02
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
A total of 1 test files matched the specified pattern.

Passed!  - Failed:     0, Passed:     1, Skipped:     0, Total:     1, Duration: < 1 ms - Microsoft.EntityFrameworkCore.SqlServer.FunctionalTests.dll (net6.0)

With 4.0 preview:

Test run for C:\dotnet\efcore\artifacts\bin\EFCore.SqlServer.FunctionalTests\Debug\net6.0\Microsoft.EntityFrameworkCore.SqlServer.FunctionalTests.dll (.NETCoreApp,Version=v6.0)
Microsoft (R) Test Execution Command Line Tool Version 17.0.0-preview-20210817-02
Copyright (c) Microsoft Corporation.  All rights reserved.

Starting test execution, please wait...
A total of 1 test files matched the specified pattern.

Passed!  - Failed:     0, Passed:     1, Skipped:     0, Total:     1, Duration: < 1 ms - Microsoft.EntityFrameworkCore.SqlServer.FunctionalTests.dll (net6.0)
Wraith2 commented 3 years ago

Ok, Managed SNI it is. @roji were your tests with Encryption off? and do you know if MARS was on or off?

roji commented 3 years ago

Yeah, Encrypt=false and no MARS. Can also run any other tests here - just let me know.

Wraith2 commented 3 years ago

No Mars is bad news. That means it's not just the usual threadpool exhaustion because with mars off sync IO is really synchronous in managed SNI.

ajcvickers commented 3 years ago

Okay, so ran with managed SNI on Windows using:

static SqlServerTestStore()
{
    var ManagedNetworkingAppContextSwitch = "Switch.Microsoft.Data.SqlClient.UseManagedNetworkingOnWindows";
    AppContext.SetSwitch(ManagedNetworkingAppContextSwitch, true);
}

My connection string is:

Server=localhost;Database=master;Trusted_Connection=True;TrustServerCertificate=True;

Now the test runs for about 10-15 seconds, but then consistently fails with:

Microsoft.EntityFrameworkCore.Query.QueryBugsTest.Thread_safety_in_relational_command_cache

Microsoft.Data.SqlClient.SqlException: A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server...

Microsoft.Data.SqlClient.SqlException
A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 40 - Could not open a connection to SQL Server: Could not open a connection to SQL Server)
   at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParser.Connect(ServerInfo serverInfo, SqlInternalConnectionTds connHandler, Boolean ignoreSniOpenTimeout, Int64 timerExpire, Boolean encrypt, Boolean trustServerCert, Boolean integratedSecurity, Boolean withFailover, SqlAuthenticationMethod authType)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.AttemptOneLogin(ServerInfo serverInfo, String newPassword, SecureString newSecurePassword, Boolean ignoreSniOpenTimeout, TimeoutTimer timeout, Boolean withFailover)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.LoginNoFailover(ServerInfo serverInfo, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance, SqlConnectionString connectionOptions, SqlCredential credential, TimeoutTimer timeout)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.OpenLoginEnlist(TimeoutTimer timeout, SqlConnectionString connectionOptions, SqlCredential credential, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds..ctor(DbConnectionPoolIdentity identity, SqlConnectionString connectionOptions, SqlCredential credential, Object providerInfo, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance, SqlConnectionString userConnectionOptions, SessionData reconnectSessionData, Boolean applyTransientFaultHandling, String accessToken, DbConnectionPool pool)
   at Microsoft.Data.SqlClient.SqlConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningConnection, DbConnectionOptions userOptions)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection(DbConnectionPool pool, DbConnection owningObject, DbConnectionOptions options, DbConnectionPoolKey poolKey, DbConnectionOptions userOptions)
   at Microsoft.Data.ProviderBase.DbConnectionPool.CreateObject(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.UserCreateRequest(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry, SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.Open(SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.Open()
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.ExecuteCommand[T](DbConnection connection, Func`2 execute, String sql, Boolean useTransaction, Object[] parameters) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 336
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.<>c__38`1.<Execute>b__38_0(<>f__AnonymousType83`5 state) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 322
   at Microsoft.EntityFrameworkCore.ExecutionStrategyExtensions.<>c__DisplayClass12_0`2.<Execute>b__0(DbContext c, TState s) in C:\dotnet\efcore\src\EFCore\Storage\ExecutionStrategyExtensions.cs:line 341
   at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.<>c__DisplayClass28_0`2.<Execute>b__0(DbContext context, TState state) in C:\dotnet\efcore\src\EFCore\Storage\ExecutionStrategy.cs:line 161
   at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteImplementation[TState,TResult](Func`3 operation, Func`3 verifySucceeded, TState state) in C:\dotnet\efcore\src\EFCore\Storage\ExecutionStrategy.cs:line 176
   at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.Execute[TState,TResult](TState state, Func`3 operation, Func`3 verifySucceeded) in C:\dotnet\efcore\src\EFCore\Storage\ExecutionStrategy.cs:line 160
   at Microsoft.EntityFrameworkCore.ExecutionStrategyExtensions.Execute[TState,TResult](IExecutionStrategy strategy, TState state, Func`2 operation, Func`2 verifySucceeded) in C:\dotnet\efcore\src\EFCore\Storage\ExecutionStrategyExtensions.cs:line 339
   at Microsoft.EntityFrameworkCore.ExecutionStrategyExtensions.Execute[TState,TResult](IExecutionStrategy strategy, TState state, Func`2 operation) in C:\dotnet\efcore\src\EFCore\Storage\ExecutionStrategyExtensions.cs:line 288
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.Execute[T](DbConnection connection, Func`2 execute, String sql, Boolean useTransaction, Object[] parameters) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 313
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.ExecuteScalar[T](DbConnection connection, String sql, Object[] parameters) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 253
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.CreateDatabase(Action`1 clean) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 116
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.Initialize(Func`1 createContext, Action`1 seed, Action`1 clean) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 92
   at Microsoft.EntityFrameworkCore.TestUtilities.TestStore.<>c__DisplayClass13_0.<Initialize>b__1() in C:\dotnet\efcore\test\EFCore.Specification.Tests\TestUtilities\TestStore.cs:line 45
   at Microsoft.EntityFrameworkCore.TestUtilities.TestStoreIndex.CreateNonShared(String name, Action initializeDatabase) in C:\dotnet\efcore\test\EFCore.Specification.Tests\TestUtilities\TestStoreIndex.cs:line 52
   at Microsoft.EntityFrameworkCore.TestUtilities.TestStore.Initialize(IServiceProvider serviceProvider, Func`1 createContext, Action`1 seed, Action`1 clean) in C:\dotnet\efcore\test\EFCore.Specification.Tests\TestUtilities\TestStore.cs:line 45
   at Microsoft.EntityFrameworkCore.TestUtilities.RelationalTestStore.Initialize(IServiceProvider serviceProvider, Func`1 createContext, Action`1 seed, Action`1 clean) in C:\dotnet\efcore\test\EFCore.Relational.Specification.Tests\TestUtilities\RelationalTestStore.cs:line 43
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.InitializeSqlServer(IServiceProvider serviceProvider, Func`1 createContext, Action`1 seed) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 82
   at Microsoft.EntityFrameworkCore.TestUtilities.SqlServerTestStore.CreateInitialized(String name, Boolean useFileName, Nullable`1 multipleActiveResultSets) in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\TestUtilities\SqlServerTestStore.cs:line 50
   at Microsoft.EntityFrameworkCore.Query.QueryBugsTest.CreateTestStore() in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\Query\QueryBugsTest.cs:line 10401
   at Microsoft.EntityFrameworkCore.NonSharedModelTestBase.Initialize[TContext](Action`1 onModelCreating, Action`1 onConfiguring, Action`1 addServices, Func`2 shouldLogCategory, Func`1 createTestStore, Boolean usePooling) in C:\dotnet\efcore\test\EFCore.Specification.Tests\NonSharedModelTestBase.cs:line 86
   at Microsoft.EntityFrameworkCore.NonSharedModelTestBase.InitializeAsync[TContext](Action`1 onModelCreating, Action`1 onConfiguring, Action`1 addServices, Action`1 seed, Func`2 shouldLogCategory, Func`1 createTestStore, Boolean usePooling) in C:\dotnet\efcore\test\EFCore.Specification.Tests\NonSharedModelTestBase.cs:line 67
   at Microsoft.EntityFrameworkCore.Query.QueryBugsTest.Thread_safety_in_relational_command_cache() in C:\dotnet\efcore\test\EFCore.SqlServer.FunctionalTests\Query\QueryBugsTest.cs:line 7092
   at Xunit.Sdk.TestInvoker`1.<>c__DisplayClass48_1.<<InvokeTestMethodAsync>b__1>d.MoveNext() in C:\Dev\xunit\xunit\src\xunit.execution\Sdk\Frameworks\Runners\TestInvoker.cs:line 264
--- End of stack trace from previous location ---
   at Xunit.Sdk.ExecutionTimer.AggregateAsync(Func`1 asyncAction) in C:\Dev\xunit\xunit\src\xunit.execution\Sdk\Frameworks\ExecutionTimer.cs:line 48
   at Xunit.Sdk.ExceptionAggregator.RunAsync(Func`1 code) in C:\Dev\xunit\xunit\src\xunit.core\Sdk\ExceptionAggregator.cs:line 90
roji commented 3 years ago

Sounds very similar/related...

Wraith2 commented 3 years ago

That does smell like threadpool exhaustion.. It'll start multiple connections and rely on a task callback to complete the connection but because the threadpool hasn't ramped up you haven't got enough threads to schedule the completion and when one becomes available as it is added to the threadpool it'll likely pick up a thread that's spend so long connecting that it's passed the timeout and will fail. If you make it through that phase things will sort of work. That's the exact presentation in https://github.com/dotnet/SqlClient/issues/422 but it looks like it's been extended into the pool now.

The immediate easy test it to artificially inflate the threadpool minthreads to the number of connections you expect to open, in this case 100, and then see if it still fails. If it all works then we know what the problem is. Fixing it though, that's a different matter.

ajcvickers commented 3 years ago

Some more info: this fails in exactly the same way with SqlClient 2.1 on Windows with managed SNI. Increasing thread pool size doesn't make any difference.

static SqlServerTestStore()
{
    var ManagedNetworkingAppContextSwitch = "Switch.Microsoft.Data.SqlClient.UseManagedNetworkingOnWindows";
    AppContext.SetSwitch(ManagedNetworkingAppContextSwitch, true);
    ThreadPool.GetMaxThreads(out var workerThreads, out var completionPortThreads);
    ThreadPool.SetMaxThreads(workerThreads * 20, completionPortThreads * 20);
}
roji commented 3 years ago

@Wraith2 I agree this indeed smells like TP exhaustion... But @ajcvickers I don't think this test was slow for me on Linux with SqlClient 2.1 (which is what we're currently using in EF Core)? Weird...

ajcvickers commented 3 years ago

@roji There is clearly some Windows/Linux difference here, not just managed/native difference.

Wraith2 commented 3 years ago

The areas that are affected by platform are mainly how pool keys are derived and how networking is implemented, both are switched on the appcontext switch you've used so you're effectively running in linux mode on windows when using that script, I've been able to repro linux-only problems on windows using that switch. Of course there could be some api call which isn't behaving the same way but that would be a bug in it's own right that needs addressing.

for the original reported issue would it be possible to get an EF-free repro?

JRahnama commented 3 years ago

Hi @roji, @ajcvickers and @Wraith2 This seems related to TP exhaustion and as much as I try to avoid relating issues to #422 I end up there, but as @Wraith2 mentioned is it possible to have an EF-free repro? It makes it much easier to follow if there is a regression.

roji commented 3 years ago

Sure, I'll do that in the coming days.

roji commented 3 years ago

Sat down to do a minimal repro (see below), and found some interesting things - I was unfamiliar with the environment of the test in question:

Stepping back... if this (IMHO questionable!) practice is officially supported, then you guys seem to have a bug in current versions, which manifests much more reliably/often in 4.0.0-preview.1. Since this seems to work flawlessly on Windows with unmanaged networking, I suspect users porting to non-Windows platforms are hitting this from time to time.

See below for the repro: leave it running against 3.0.0 and 4.0.0-preview.1, and also compare with unmanaged networking.

Repro ```c# AppContext.SetSwitch("Switch.Microsoft.Data.SqlClient.UseManagedNetworkingOnWindows", true); ThreadPool.SetMinThreads(200, 200); const string ConnectionString = "Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Encrypt=false;Multiple Active Result Sets=true"; using (var conn = new SqlConnection(ConnectionString)) { conn.Open(); using var cmd = new SqlCommand { Connection = conn, CommandText = @" DROP TABLE IF EXISTS [Lists]; CREATE TABLE [Lists] ( [Id] INTEGER, [IsDeleted] BIT )" }; cmd.ExecuteNonQuery(); } while (true) { var stopWatch = Stopwatch.StartNew(); using var conn = new SqlConnection(ConnectionString); conn.Open(); Parallel.For( 0, 100, i => { using var cmd = new SqlCommand { Connection = conn, CommandText = @" SELECT [l].[Id], [l].[IsDeleted] FROM [Lists] AS [l] WHERE ([l].[IsDeleted] = CAST(0 AS bit)) AND [l].[Id] IN (1, 2, 3)" }; using var reader = cmd.ExecuteReader(); // using var context = contextFactory.CreateContext(); // var query = context.Lists.Where(l => !l.IsDeleted && ids.Contains(l.Id)).ToList(); }); Console.WriteLine($"Completed in {stopWatch.ElapsedMilliseconds}ms"); } ```
Wraith2 commented 3 years ago

Contrary to what I wrote above, the test does run in MARS, always.

That's a relief, sort of. It means that longstanding behaviour I don't know how to fix is still the problem and that we don't have another.

The test suite actually has the test reuse the same DbConnection instance - in parallel - rather than separate DbConnection instances per thread.

Ew. Not supported. If you were actually making concurrent calls to anything other than Open I expect you'd see really scary exceptions that warn about concurrent usage but since this is about Open it's not unsafe or easily detectable until you've opened.

In 4.0.0-preview.1, some sort of change makes this block or error very quickly.

That should make it easier to trace which is good.

roji commented 3 years ago

Ew. Not supported.

FWIW I completely agree it shouldn't (and it's even documented that way) - so this is our bad on the EF Core.

But I'm a little worried by the fact that the scenario works perfectly on unmanaged Windows - it may indicate that thread-safety was somehow an original goal. But more importantly, the distinction between thread-safety and multiple result sets is a bit subtle for the unsuspecting new user, and I'm guessing there's lots of code out there which does concurrent usage (and would break when ported.

In 4.0.0-preview.1, some sort of change makes this block or error very quickly.

Definitely agree.

Wraith2 commented 3 years ago

From what I remember mars is there to allow sloppy programming. COM ADO had a fun behaviour whereby is you had an open ResultSet and tried to open another it would use a new thread to open a second one for you silently in the background and handle the marshalling for you (at least in vb6) which let you do things like open a transaction and then start updating one resultset while opening a second resultset inside that same transaction to lookup data.

People liked this because it means that the code to do such things was a lot smaller than having to think and code fully transactionally and deal with rollbacks etc. As you can imagine it got into all sorts of Line Of Business and asp programs. When the migration from .net was being pushed it was a major upgrade blocker if you couldn't do the same thing on .net so it had to be implemented. I think MARS is a terrible idea and that no-one should use it. Sadly there's no way anyone will ever let me take it out of the driver.

Mars in SqlClient strictly applies to allowing multiple SqlDataReader instances to be active at one time on the same connection and works by multiplexing the traffic using the MC-SMP protocol. In practical terms it changes the way data is received by decoupling the packet receipt loop from the reader because otherwise you can deadlock the two SqlDataReaders, the receive loop with Mars enabled is implemented with async callback so just having mars enabled even if you aren't using it puts you at the mercy of threadpool exhaustion.

TLDR; just don't use mars, it's not the right solution to whatever problem you're having.

As far as thread safety goes. I'm not aware of anything being thread safe in SqlClient, but also nothing cares about what thread it's running on (apart from connecting with impersonation) so as long as your access is sequential you're unlikely to have a problem. I suspect the test case is serializing connection attempts at the pool level so even though the caller is concurrent the internals are likely sequential so you see no error. You'll probably want an official answer on this, I'm just recalling what I've seen in the code.

cheenamalhotra commented 3 years ago

This is indeed #422.

TLDR; just don't use mars, it's not the right solution to whatever problem you're having.

I wouldn't say that because this is a client design issue, I have a feeling it's related to SNIMarsHandle queue design. And the problem is not with multiple data reader design, problem is when connection.Open() is called with MARS enabled in managed SNI.

We need to remember MARS is enabled using "async" reads, which uses the blocking queue design and that design has been my suspicion is the culprit. But the bigger challenge is we've only known this 1 design pattern in Managed SNI, so we need to really break down the design pattern and come up with something that would solve this sync/async contradiction or do things synchronously in that space.

JRahnama commented 3 years ago

we need to really break down the design pattern and come up with something that would solve this sync/async contradiction or do things synchronously in that space.

I agree. IMHO, I think we should separate sync and async calls. async is contagious and effects the entire environment ( not sure but thread pool management could be included in the environment change) , plus in MS docs it has been suggested as the best practices to avoid mixing sync/async, but on the other hand if we do so, we will end up having two sets of codes and it is hard to maintain. I remember that @roji mentioned in some other PR that they are maintaining sync/async in the very same code in their product with an async validator boolean, when async, for example, is true other part of the code runs. if I am not mistaken, I wonder if they ever noticed any issue with this pattern?

cheenamalhotra commented 3 years ago

The problems trigger with #933 as exceptions start to occur. @Wraith2 could you investigate design changes made that could contribute to this?

Wraith2 commented 3 years ago

It immediately hits an assert in debug mode SNIMarsHandle.HandleRecieveComplete should be called while holding the SNIMarsConnection.DemuxerSync because it can cause deadlocks so yeah, saw that one coming 😁

Add this to the list of reasons we need to be able to run tests in debug mode.

Wraith2 commented 3 years ago

@roji what exception are we expecting to see here? or aren't we? In debug mode the only problem it's having is caused by the fast re-use of a single session. The session identifier is limited to a 16 bit integer and if you create and destroy sessions fast enough you eventually wrap around and hit a number that's already in use which will bubble up as a connection failure.

cheenamalhotra commented 3 years ago

If you run the repro from Shay, you won't get exception using v3.0.0, and queries continue (even though slowly) without exception. I think there's some kind of deadlock with new changes that happens randomly, and code terminates abruptly with exception.

I'd say compare behavior before and after your change to understand it's impact.

Wraith2 commented 3 years ago

I'm using Main which is 4.0 preview currently. No crash. I get a slow start because I'm not forcing the threadpool min as I wanted to see any long blocking behaviour but the only crash is from exceeding 65000+ sessions.

cheenamalhotra commented 3 years ago

image Left > With head just before #933 was committed.(6a5a670842fe958e7d5f89de194d7604735d2d4f) Right > With latest "main"

Running repro on .NET 5 with Managed SNI enabled. The only change in my repro is I commented out ThreadPool.SetMinThreads(200, 200); to use default settings.

Wraith2 commented 3 years ago

Ok, that's just normal timing out which is what you'd expect. It looks like I can just run it fast enough to succeed most of the time. I've seen large numbers of threads waiting https://github.com/dotnet/SqlClient/blob/fbfd5d7bbf5bbd929f9d15403156d09d7553514f/src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SNI/SNIMarsHandle.cs#L530 which is expected behaviour. They've requested a packet and are waiting for the dispatch loop to wake them, it can't because it can't be scheduled due to lack of threads and we have no way to give it any sort of high priority because tasks don't have priority.

cheenamalhotra commented 3 years ago

I don't think that's a fair analysis or "normal" behavior.

The exception is consistently reproducible with new change and the code also "consistently" passes with previous design. So it is a regression that needs to be addressed.

There's likely a deadlock in SNI layer, if threads are waiting and not receiving packets.

roji commented 3 years ago

@roji what exception are we expecting to see here? or aren't we?

So I think it's OK to say "if you use SqlConnection concurrently, the behavior is undefined" (though again, keep in mind this seems totally supported on unmanaged Windows). A better option is to actually detect these cases and throw a "connection used concurrently" InvalidOperationException, which would really help users find errors in their code. Npgsql does this but in a way that isn't 100% reliable, and I've generally lost a lot of time over the years helping users finding their accidental concurrent usage.

If you run the repro from Shay, you won't get exception using v3.0.0, and queries continue (even though slowly) without exception.

This was true for me - but only on unmanaged Windows. 3.0.0 with managed does eventually throw an exception for me, though it took a lot more time than 4.0.0; if that's true, then there may not be an actual regression here...

I remember that @roji mentioned in some other PR that they are maintaining sync/async in the very same code in their product with an async validator boolean, when async, for example, is true other part of the code runs. if I am not mistaken, I wonder if they ever noticed any issue with this pattern?

Yeah, this is how Npgsql is architected, and it has worked quite well - it's a good way to fully support sync and async without code duplication, and you can still split implementations on a per-function basis where you need to.

Wraith2 commented 3 years ago

I don't think that's a fair analysis or "normal" behavior.

I look forward to seeing your analysis.

There's likely a deadlock in SNI layer, if threads are waiting and not receiving packets.

I can't see one or any evidence of anything other than behaviour consistent with not being able to schedule a task.

It took me months to get it this far and then more months to get it reviewed. If there is evidence of a problem with it then I'm happy to help out but I can't see any. There's a performance problem but lacking a reproduction of that problem and not just #422 I can't do much about it.

I'm pretty sure the assertions of demuxer lock entry in the SNIMarsHandle methods are outdated and can be removed, the demuxer lock only has to guard the annotated variables and anything that sends a packet including the SNIMarsHandle ctor because someone was stupid when they wrote it.

JRahnama commented 3 years ago

I have tried @roji's repro with version 3.0.0 and it works fine, but with the newest preview it fails right away as below:

image

@Wraith2 I am afraid, but I believe this is a regression and has eliminated the only workaround we had for issue 422.

Something interesting: I was trying to compare behavior with PLINQ, ( the below code) as the original issue #422 was with PLINQ and Parallel.For was the provided workaround. It was working slow, but was running till completion. I was just trying to rearrange the code to add more to it. I surprisingly noticed it works fine with the latest release. This might be because of my mistake. I really appreciate if anybody can try it and confirm or deny my findings.

Repro ```C# static void Main(string[] args) { AppContext.SetSwitch("Switch.Microsoft.Data.SqlClient.UseManagedNetworkingOnWindows", true); //ThreadPool.SetMinThreads(200, 200); SqlConnectionStringBuilder builder = new() { DataSource = "localhost", InitialCatalog = , UserID = "sa", Password = , ConnectTimeout = 60, ConnectRetryCount = 0, Encrypt = false, MultipleActiveResultSets = true }; while (true) { RunWithParallelFor(builder.ConnectionString); } //RunWithPLinq(builder.ConnectionString); } private static void RunWithPLinq(string connString) { using SqlConnection conn = new(connString); conn.Open(); Enumerable.Range(0, count). AsParallel(). WithDegreeOfParallelism(count). ForAll(n => RunSqlCommand(conn)); } private static void RunWithParallelFor(string connString) { using var conn = new SqlConnection(connString); conn.Open(); Parallel.For( 0, 100, i => { RunSqlCommand(conn); }); } private static void RunSqlCommand(SqlConnection conn) { var stopWatch = Stopwatch.StartNew(); using var cmd = new SqlCommand { Connection = conn, CommandText = @" SELECT [l].[Id], [l].[IsDeleted] FROM [Lists] AS [l] WHERE ([l].[IsDeleted] = CAST(0 AS bit)) AND [l].[Id] IN (1, 2, 3)" }; try { using SqlDataReader reader = cmd.ExecuteReader(); Console.WriteLine($"Current thread {Thread.CurrentThread.ManagedThreadId} Completed in {stopWatch.ElapsedMilliseconds}ms"); } catch (SqlException ex) { Console.WriteLine(ex.Message); } } private static void TableMaintainer(string connString) { using var conn = new SqlConnection(connString); try { conn.Open(); using var cmd = new SqlCommand() { Connection = conn, CommandText = @" DROP TABLE IF EXISTS [Lists]; CREATE TABLE [Lists] ( [Id] INTEGER, [IsDeleted] BIT )" }; cmd.ExecuteNonQuery(); } catch (SqlException ex) { Console.WriteLine(ex.Message); } } ```
Wraith2 commented 3 years ago

@Wraith2 I am afraid, but I believe this is a regression

Yes, it is a regression, I thought that much was clear from the very start of the discussion.

Assuming it's from the mars rework is unproven. I agree that it's possible and maybe even likely but so far no evidence. It won't crash with anything other than duplicate key under a debugger for me and we're going to need a debugger to investigate.

and has eliminated the only workaround we had for issue 422.

What workaround for #422? the only workaround I knew of was to not starve async of threads.

I managed to implement a custom scheduler and get it working so that all mars async io is done on dedicated threads. Still incredibly slow and the profiler shows the contention time allocated to the _parserLock which makes sense to me since we've got up to 100 threads contending for the same parser.

Wraith2 commented 3 years ago

Here's what i see on current main:

Overall: image

breakdown of lock contention:

image

roji commented 3 years ago

I have tried @roji's repro with version 3.0.0 and it works fine, but with the newest preview it fails right away as below:

Just to note that running my repro above against 3.0.0 (with managed networking) does fails - it just takes a bit of time (the program eventually blocks and then throws unpredictably). So I'm not sure there's a regression here in 4.0.0-preview.1.

Wraith2 commented 3 years ago

Interestingly if you manage to force mars async io off threadpool threads and onto dedicated threads the number of thredpool threads required (and created) drops significantly. The time taken to execute doesn't drop because the majority of it is taken up on the parser lock but in general the profile looks a lot cleaner. I couldn't work out how to do this before but working through some of the postgres issues you had with the context switcher let me learn what I needed.

image