Investigate SqlClient connection pool for any performance improvement opportunities

saurabh500 commented 6 years ago

The repo https://github.com/aspnet/DataAccessPerformance can be used to get a measurement of the throughput of an ADO.Net client.

When the benchmark is run with Connection Pool enabled, and connections are opened and closed every time a query is performed, I see a TPS number of 40k for async and 50k for sync data retrieval.

I changed the benchmark to reuse the same connection per thread and this causes the TPS to jump to 65 k and 90k for async and sync scenarios, respectively.

By changing how the connection is opened, I eliminated the connection pool from the benchmark, as the same connection is reused for the data operations. My modifications are at https://github.com/saurabh500/DataAccessPerformance/blob/modVariation/src/BenchmarkDb/AdoDriver.cs#L106

This means that there could be a potential to improve the performance of the connection pool in SqlClient ADO.Net Driver to manage connection faster.

It could be suffering from one of the following problems based on my understanding of the connection pool. I have not profiled this piece yet.

There are problems with fetching the connection in the connection pool causing the slow down.
There are problems while returning the connection to the connection pool causing the slow down.
When a connection is returned to the pool, sp_connection_reset is issued to the server and some other network operations to clean the connection, which happen. I don’t know how much of an overhead this would be, though I anticipate that there are some.

I am opening this issue to see if any performance improvement can be made to SqlClient connection pool to have a better throughput in providing connections.

If it is 1 or 2 plaguing the driver then there could be client side improvements that can be made, if it is 3, then that would be much harder as there could be protocol changes needed.

Connection pool should be profiled to find any other optimization opportunities, if any.

saurabh500 commented 5 years ago

cc @Wraith2 This is the issue I was talking about.

Wraith2 commented 5 years ago

There's a lot going on in code to get a single connection but none of the benchmarks indicate a bottleneck when doing so. I've tried various things and while I can find the connection pooling code showing up it's never hot, it looks like ~3.5% of time is spent getting connections from pools.

That said, I can see what you mean about moving the connection outside the loop. It does give a clear and repeatable increase in throughput. However changing other things also has large changes:

mode	threads	tps	stddev	notes
sync	64	21897	2507	trusted auth; connect inside
sync	64	32801	1841	trusted auth; connect outside
sync	64	25977	2758	sql auth; connect inside
sync	64	30228	3169	sql auth; connect outside

My hardware can't do 64 threads at once so those results are hugely cpu bound, auth mode changes how you get the pool slightly. Those numbers are silly.

I'm using local sql server and a 4 core non hyperthreaded processor so I tried ramping up from single thread to 16 to see what the numbers looked like. These are all sync, connection in the loop.

threads	tps	stddev
16	24735	2284
8	24080	3225
4	21955	5466
2	15458	458
1	7892	232

So what we're seeing seems to depend entirely on how many real concurrent threads you can throw at the problem and any time it isn't actively querying shows oddly in the results. We're going to see maximum throughput as we spend all out time doing queries and not going around doing infrastructure and busy work creating/dropping connections etc.

Wraith2 commented 5 years ago

I've been doing some more poking around at this and there are some small opportunities to improve connection retrieval from the pool, i'm not sure what compatibility requirements there are for internals.

at https://github.com/dotnet/corefx/blob/684704de20c28103ea8e0063bda7ed98f0912c58/src/System.Data.SqlClient/src/System/Data/ProviderBase/DbConnectionPoolIdentity.Windows.cs#L25 the user object shouldn't be fetched twice because it's expensive (memory and cpu) to do so, fetch once and use from a variable will improve things by <10% but it's something.

In the same function the property access user.Value is very expensive, it formats then allocates a string and then force uppercases it. It returns a sid string which is used as part of the key name for the pool. If that were changed to be aSecurityIdentifier.GetHashCode()ToString() it would remove a lot of cpu and memory load and make getting a connection about 30% faster for trusted logins. Doing this would make it more difficult to debug internal issues because the pool would no longer contain the sid of the user, is a small amount of debuggability worth sacrificing for performance?

saurabh500 commented 5 years ago

@Wraith2 Since you are looking at ways to improve the connection pool performance of SqlClient e.g. https://github.com/dotnet/corefx/pull/36667 , here is what I suggest.

The above tests were done with SqlClient on Windows, i.e. by using the native networking layer.

The SqlClient on Windows with native networking code has been working fairly well and there have been no known issues with the networking layer.

The experiment in this issue shows that when we return and re-request connections from the pool, the TPS drops. This means that in some way the connection pool connection management is a bottleneck.

It might be worth profiling what is happening in case of the connection pool connection management and why the TPS drops when connection pool is involved.

Ideally the cost of retrieving an already existing connection from the pool should be close to 0. But that doesn't seem to be the case based on this experiment. It might be worth double clicking into the the connection pool under the profiler to see why such behavior occurs.

At the end of it, we may come up with hot spots in the pool, which we may be able to brainstorm to see if they can be improved, or we may conclude that the connection pool is doing everything it needs to do in order to maintain the sanity of the connections and there is no scope for improvement.

Let me know if that makes sense and provides a slightly narrower problem statement to look into.

Wraith2 commented 5 years ago

I'm getting an exception running the DataAccessPerfomance project in async mode and it seems to be related to a recent change I made in https://github.com/dotnet/corefx/pull/34573 at https://github.com/dotnet/corefx/blob/7b320a0ad1ea83670f561a6ef9231916465bb4b4/src/System.Data.SqlClient/src/System/Data/SqlClient/TdsParserHelperClasses.cs#L271-L279

The return call is throwing argument null on the array parameter sometimes in debug and every time in release indicating it's a speed issue. The only way I can see to cause it is it have multiple threads accessing the method at once but that shouldn't be possible unless the pools is flawed.

saurabh500 commented 5 years ago

Ouch. Can you share the stack trace as well? I am not familiar with this code that you have added. I will look at it tomorrow. Also if you can find a fix that would be great. I think we are close to snapping the master for a release.

Wraith2 commented 5 years ago

remediation PR is https://github.com/dotnet/corefx/pull/36735

Wraith2 commented 5 years ago

I've been doing some more profiling. The connection pool is about as good as it's going to get without something fundamentally changing the amount of work it needs to do. In windows auth mode my previous changes make it quicker as expected but the bottleneck is the fact that you have to check the user identity on every request and I don't see a secure way to avoid that.

Fundementally it's faster to not to work than it is to do it. If you have an open connection the work of opening one is unneeded and you get that back as tps. If a user can write the code with a single connection they should, if they can't then there's not a lot we can do to help.

In more general terms there are performance increases available. No huge gains but a few percent along a few different approaches. I've got one branch which reworks the async sql reader functions that improves tps by about 1.5% and decreases the stddev by ~16%, so more reliable and a little quicker. I've another which changes how tdsparser state is packaged for replay which gains ~4%. There are other bits and pieces I can look at as well but i'm not sure if it's worth the time or not. Are small percentage tps increases worth the time and effort it'll take for review and testing? Is this an are of interest or a rabbit hole of my own making?

Oh, and in the async test the code mixes sync and async methods by awaiting open and read but then calling GetInt, should it awaiting GetFieldValueAsync<int> instead? Or is that not the recommended pattern?

roji commented 5 years ago

Just noticed this @Wraith2 - unfortunately I don't yet watch SqlClient-specific issues as well as I should. Please feel free to cc me on anything that could have a general ADO.NET impact (or just in general).

Fundementally it's faster to not to work than it is to do it. If you have an open connection the work of opening one is unneeded and you get that back as tps. If a user can write the code with a single connection they should, if they can't then there's not a lot we can do to help.

FWIW I remember being at exactly this point when optimizing the Npgsql provider; it's true that a single DbConnection will always result in the best perf, in any scenario. However, many (if not most) scenarios seem to involve very short-lived connections (e.g. typical web apps), which is why the considerable effort into optimizing the pool seemed to be worth it.

It's true that at the end of the day it's a matter of the amount of work that needs to happen at the pool. For example, I'm assuming that when a connection is returned to the pool, SqlClient does some sort of reset to avoid state leaking from one usage to the next. One relatively easy optimization done in Npgsql, is to defer this reset to the next time the connection is actually used (after it'll get opened). At that point, the reset protocol message is prepended to the first user query, saving a roundtrip. I have no idea if something like this is possible with SqlClient/TDS, just voicing some ideas.

Oh, and in the async test the code mixes sync and async methods by awaiting open and read but then calling GetInt, should it awaiting GetFieldValueAsync instead? Or is that not the recommended pattern?

As discussed in https://github.com/aspnet/DataAccessPerformance/issues/54 (just summing it up), the general guidance is to call GetFieldValueAsync() only if CommandBehavior.Sequential is specified, since otherwise the entire row is expected to be buffered in memory and getting the column will never perform I/O. Since all this only makes sense for large rows, that's not necessarily the primary scenario we're interested in benchmarking.

BTW just noticed #25132 from back in 2017 which discusses some potential pooling improvements, including a potential 10% increase (I have no idea if that's already been done, is doable etc.).

/cc @divega

Wraith2 commented 5 years ago

BTW just noticed #25132 from back in 2017 which discusses some potential pooling improvements, including a potential 10% increase (I have no idea if that's already been done, is doable etc.).

I did some work to make trusted identity faster in https://github.com/dotnet/corefx/pull/33579 and another PR I can't find at the moment which reduce times for trusted connections. I can't see a clear way to avoid looking up the user each time that doesn't leave security holes.

There are further enhancements that could be made with small benefits from what I remember but nothing that I felt was worthwhile pursuing. From memory I think one of the more expensive parts was marvin string hashing for the dictionary keys.

I don't know how you identify hot locks, that would be a useful skill to have. Any guidance on that?

roji commented 5 years ago

I don't know how you identify hot locks, that would be a useful skill to have. Any guidance on that?

That's indeed a good thing to focus on - contention is a likely culprit when it comes to connection pools. A good, easy, first indicator that lock contention is happening is that your process isn't at close to 100% CPU - but watch out because depending on your environment/setup your process could also be stuck on I/O.

A more serious examination requires profilers like PerfView or DotTrace, which provide a direct view on time spent in locks and where. This kind of investigation isn't easy, but it's super important in order to understand what's actually going on in your program.

saurabh500 commented 5 years ago

FWIW issue https://github.com/dotnet/corefx/issues/25132 was opened after Vance (author of Perfview) had looked at the fortune benchmarks with SqlServer and SqlClient had identified the hot perf areas.

Wraith2 commented 5 years ago

So he probably knows how to find these things 😀 I've got dotTrace and it looks like timeline mode will show hot locks for me so once I've got some other things out of the way I can take a look into these. What's the probable bar for tps relevance? 5%, 1%, less ?

saurabh500 commented 5 years ago

:smiley:

The part which really concerns me is that the TPS with connection pools and without, has a huge gap. I have used PerfView and DotTrace, and I didn't find any areas that we could attack immideately.

@roji mentioned about cleanup of connection state. My theory is that the moving the cleanup to subsequent connection open would help in async cases and not in sync cases. This is because Dispose() is sync and in async cases, it can add latencies, because SqlClient needs to send a message to the server, asking the server to reset the connection, and that network call will happen synchronously in async cases.

For sync cases, whether we do the reset in Dispose or Open, it wouldn't matter and the discrepancies would remain the same as above.

I had used dotTrace to profile SqlClient and found the GetConnectionPoolGroup to consume 1.12% of time on all the thread of Dataaccessperformance

Another connection open area that could be interesting to look at

Talking about non-connection pool areas and getting sql string values. ( I think there are parts here, which you haven't changed to use stack based memory, may something worth looking at as part of another issue). Not sure how much benefit we can get here from using Span based APIs, but then I know that this can be improved.

saurabh500 commented 5 years ago

Improving PrepareConnection is probably the right place to start for SqlClient connection pooling. From 25132 a. (2.7%) system.data.sqlclient!System.Data.ProviderBase.DbConnectionPool.PrepareConnection

roji commented 5 years ago

For the record, the optimization with the connection reset was about reducing a roundtrip. Originally when you returned a connection to the pool it executed a roundtrip for the reset before doing so. I simply arranged for the reset message to be sent in the same roundtrip as the first query executed after the connection is next allocated from the pool, as a sort of "piggyback" mechanism. At least in my world it doesn't really matter if you're doing sync or async (but the networking situation may be different in SqlClient). A roundtrip reduction can be pretty dramatic - especially if there's a lot of latency (e.g. server is far away).

I'm curious... in #25132 there's mention of a 10% gain by caching the Windows identity, "or to mimic what Windows docs". @Wraith2, I think you wrote above that caching the identity creates security holes; - I'd be interested in knowing more about this - we're discussing pooled open and not physical connection open. Also, intuitively, since the process is always running under the same user it's hard to imagine what kind of problem there may be.

Even if there is a problem, is it worth looking at what Linux does? I'm asking, because if SqlCient is accessing an OS facility to get the identity on every pooled open then that's definitely quite bad for perf.

roji commented 5 years ago

DotTrace is pretty good - I've found that it's generally a bit easier to learn and use than PerfView. As for a bar for TPS relevance - many times perf is a story of "death by a thousand cuts", so rather than a big chunky change you get 10% by doing a lot of little ones (although there's nothing more satisfying than a single big jump - the identity caching may get you there :)).

saurabh500 commented 5 years ago

I simply arranged for the reset message to be sent in the same roundtrip as the first query executed after the connection is next allocated from the pool, as a sort of "piggyback" mechanism

Good point. At this point I am not sure if this is possible in SqlClient. (though I don't think there is any reason for this to not be the case)

The issue is that without Integrated Auth and with low latency environments, there seems to be enough TPS loss in SqlClient. You bring up a good point about "death by a thousand cuts". Pursuing those cases would prove to be significantly beneficial in all scenarios.

Wraith2 commented 5 years ago

I'm curious... in #25132 there's mention of a 10% gain by caching the Windows identity, "or to mimic what Windows docs". @Wraith2, I think you wrote above that caching the identity creates security holes; - I'd be interested in knowing more about this - we're discussing pooled open and not physical connection open. Also, intuitively, since the process is always running under the same user it's hard to imagine what kind of problem there may be.

The user identity becomes part of the pool key when integrated auth is used. When fetching a connection from the pool you fetch the user identity sid and format a string to create the key, then you check that pool. Since it is possible to impersonate users you can't assume that the identity of the current uset is the same as it was before and if you store the identity somewhere you leave it open to people finding and misusing it. I made the process of getting and formatting the sid a bit faster but didn't want to go into the security minefield.

From what i remember the linux approach is just to put in the username not the account SID which is probably slighlty faster since you won't need an api call to get it (should be part of the environment block on windows) but imo is open to misuse because it's very easy to guess someone else's pool key and get a connection from their pool. Changing the pool key to be binary or a derived hash might yield some benefits but when i asked about this no-one replied.

As far as TPS goes i'll investigate the connection perf and locks when i've got a few PR's cleared. The more PR's i have open at once the more i have to keep them in line with the master because it moves much faster than SqlClient and the more it pollutes the merge history. As i said above i've got about 20% TPS improvements for anything doing async reads queued up. More work on bringing managed perf in line with native and then a whole load of other bits and pieces to investigate.

roji commented 5 years ago

From what i remember the linux approach is just to put in the username not the account SID which is probably slighlty faster since you won't need an api call to get it (should be part of the environment block on windows) but imo is open to misuse because it's very easy to guess someone else's pool key and get a connection from their pool. Changing the pool key to be binary or a derived hash might yield some benefits but when i asked about this no-one replied.

So if I'm understanding you correctly, you're trying to protect against a security attack within the same process? This is not something that's usually done: if somehow the process itself is compromised, an attacker has read access to SqlClient's memory anyway, and can access and manipulate all the connections (e.g. by iterating all the pools)... I think that from a security standpoint it should be fine to just use a username in the pool key - but maybe I'm misunderstanding your concern.

Don't forget that connections are in general supposed to even cache their password, since you're supposed to be able to clone a connection and open the clone. That's a much more sensitive piece of information that's kept in memory, since the general assumption is that things are safe within the process boundary.

All this with the obvious assumption that you can get a username quicker than you can get an identity SID, of course.

Wraith2 commented 5 years ago

I just tinker with this stuff i didn't write it and can't do more than interpret what i see in the code and attribute intention where i think it's clear. In this case if you want to get to a pool for the current user you ask the OS who they are because any given thread can be placed into a mode where it is impersonating another user to the operating system.

roji commented 5 years ago

I just tinker with this stuff i didn't write it and can't do more than interpret what i see in the code and attribute intention where i think it's clear.

Of course - I'm just making suggestions and am not 100% sure that what I'm saying is correct, since I don't actually know SqlClient very well. Maybe @saurabh500 can comment on this point.

In this case if you want to get to a pool for the current user you ask the OS who they are because any given thread can be placed into a mode where it is impersonating another user to the operating system.

So impersonation is supported on a ~user-by-user~ thread-by-thread~ basis, I see. If so, then it's indeed necessary to check the username on each pooled open; however, there may be a lighter way to do so, or some way to quickly check if we're impersonating or not. In #25132 this is mentioned as a pretty big factor, so it may be worth looking into it.

Wraith2 commented 5 years ago

I did some tinkering and made a couple of changes. Interestingly they don't increase the TPS which I guess just means I'm not contention limited. I'm also not quite sure how dotTrace is doing the maths on these percentages but the bits I've fiddled with are a bit better.

master: dataaccessperf-master

branch: dataaccessperf-locking

So I can knock 500ms contention off OpenAsync.

The other areas are guarding against real race conditions and the locking is sensible (and breaks lots of things if you disable it to just see if things will work without it). An expert could probably write it in lock free-er ways but I'm not that expert. There are some bits of time in the ReadAsync and ExecuteDbReaderAsync path that will go away when one of my other PR's is ready.

roji commented 5 years ago

Interestingly they don't increase the TPS which I guess just means I'm not contention limited.

OK. This could also be because of the scenario being executed locally - I know that cutting down on contention really started showing when I tested the new version on TechEmpower (whereas locally I could hardly see a change). Unfortunately that's not easy for you to try out... In any case, the small improvements do add up.

BTW it's surprising to see that setting the connection string takes almost 200ms, I wonder what's going on in there...

Wraith2 commented 5 years ago

BTW it's surprising to see that setting the connection string takes almost 200ms, I wonder what's going on in there...

If you set the connection string you have to use it to generate a pool key and check to see if the inner connection you've got has that same key, or swap out the one you've got for one from the right pool, which might require creating a pool, which needs locks. Connection pooling, it's almost lovecraftian.

divega commented 5 years ago

As recently announced in the .NET Blog, focus on new SqlClient features an improvements is moving to the new Microsoft.Data.SqlClient package. For this reason, we are moving this issue to the new repo at https://github.com/dotnet/SqlClient. We will still use https://github.com/dotnet/corefx to track issues on other providers like System.Data.Odbc and System.Data.OleDB, and general ADO.NET and .NET data access issues.

GSPP commented 5 years ago

For the record, the optimization with the connection reset was about reducing a roundtrip.

How does this interact with possibly open transactions? It's important that those are killed when the connection is disposed.

Also, temp tables might hold up significant resources.

dotnet / SqlClient

Investigate SqlClient connection pool for any performance improvement opportunities #25