SqlServer.Core: Performance-oriented SQL Server .NET driver

roji commented 3 years ago

An experiment in collaboration with the community to determine what potential there is modern .NET features in a highly performant SQL Server driver.

ajcvickers commented 3 years ago

@ErikEJ @wraith2 @NickCraver @mgravell @Drawaes: Let's get this party started!

@smitpatel, @roji, and I discussed how to get started with Woodstar, and we have created some initial issues for the ideas we came up with. We want this to be collaborative, so please add your ideas, either by commenting on issues or in new issues as appropriate.

Initial ideas:

/cc @davidfowl

NickCraver commented 3 years ago

Would it be useful to list current pain points, especially w.r.t. performance in the current driver structure? Discussing connection pooling and what we are/aren't able to see or get metrics on today due to what's exposed and what's internal is something that doesn't hurt so much per connection but is a pain at scale in various ways. Today, we're wrapping connections to make best-guess workarounds for how pooling is behaving, which commands are in flight when a stall happens, etc.

I know out goal is performance, but I think it's worth considering the driver/implementation surface area along these lines as we go, and maybe current pain points is the good thing to have as a starting point on that front?

Wraith2 commented 3 years ago

Async strings and blobs. The current internal structure makes their performance non-linear. It is a long project to fix it but it's complicated and there is no mitigation other than using SequentialAccess mode and constructing the string yourself.
Task based internal architecture uses almost entirely imperative task classes rather than langauge supported and this generates a lot of garbage in Tasks, TaskCompletionSources, ContinueWith closures etc. This could all be langauge and valuetask based and perform faster and cleaner
The codebases mixes sync and async paths liberally and confusingly. Along with lots of locking this makes making logical deductions about safety hard. Atomic state changes would be a lot easier to work with
The code is complex and hard to follow with very little explanation for internal choices. e.g., why when connecting to a named address does it choose the last ip4 address returned from the dns to try first? If we change it do we break someone?
Parameters being non-generic forces the use of boxing object containers and leads to reflection based runtime type discovery and conversion.
Connection pool keys being canonicalized and stored as strings incurs overhead because every connection because the string has to be generated each time. A dedicated immutable type would be better.
The SNI (Sql Network Interface) layer) and Tds Layer are disconnected loops and every packet received is copied at least twice, at least half of the byte arrays used are dropped to GC because the lifetime isn't tracked so they aren't known to be safe to re-use.

Plus a whole load of other random bits and pieces that when/if we encounter the same problem in a new implementation I will hopefully spot and warn about so we can avoid the same implementation choice. In general it's about what you expect from a 20 year old codebase at this point. Decisions which were correct at the time are now outdated but the weight of the codebase is so high that changing direction carries unacceptably high risk. Writing from a high level async-first perspective will get us an easier codebase to work with.

NickCraver commented 3 years ago

Should async-only be a design goal here, like HttpClient in .NET Core before...the happening?

Wraith2 commented 3 years ago

I'd prefer to support as much of the existing external surface area as possible because it will enable widest adoption if we get to production stages. I'd say async first but not async only if that's possible, I think it is possible because postgres manages it afaik.

davidfowl commented 3 years ago

I'd hope we'd build the core engine with a different API shape than what ADO.NET wants. We can write a slow adapter for that layer but the new APIs shouldn't be bound by it. Also take a look at this issue https://github.com/dotnet/runtime/issues/28882 as a way to avoid boxing for DbParameters.

Wraith2 commented 3 years ago

The problem with using another shape is knowing what it is, there can be no consumers until we write one so we don't know what shape is required without a specific and well defined set of consumer requirements, who's the first consumer?

Drawaes commented 3 years ago

Well you have SO people commenting right here. The point I thought is to push the boundaries of performance and not support the widest number of use cases. As David has said slower compatibility could be built on top. In reality though in the case of sync you are just hiding what is really going on as all db operations are async in nature if there is network or any disk access involved.

Wraith2 commented 3 years ago

Ok, pure perf it is and then we'll see. That'll allow ignoring all the existing public surface area defined in terms of Task and using ValueTask throughout which will be lovely.

You say that all db operations are async. All IO operations are async and if you meant that then I agree. However because TDS is packetized there can be multiple columns and even rows in a single packet so a single data fetch may not be async it could just be fetched from memory. At the moment in SqlClient you either have to allocate a task every time or you have to use sync and live with the thread blocking sometimes. ValueTask<T> GetFieldValueAsync(int index) will make that problem go away.

We may also want to think about being clever with pre-dispatching packet requests. Currently you run through the packet until you run out of data and then request more and wait for the response. If you can guess based on reading, a string for example, that you won't have enough data you could pre dispatch the next packet request and then it might be ready when you need it. A fundamental limitation of any library like this is network latency and we might be able to hide some of that.

roji commented 3 years ago

My vote here would be to only do a non-ADO.NET provider if figures clearly show ADO.NET is a serious blocker for performance. My Npgsql experience is telling me that ADO.NET in itself shouldn't be a (significant) blocker for high-perf scenarios, and doing a non-ADO.NET driver basically means it's unusable by all existing .NET layers (Dapper, EF, LLBLGen Pro...). Of course we can always build an ADO.NET adapter on top of a non-ADO.NET adapter, but I wouldn't necessarily go in the direction of "break the ecosystem" until we see a good reason to.

Note that while doing optimization work on EF Core, I've recently seen that saving an allocation here and there has less of an impact on end RPS perf than expected (by me at least), and in at least some cases ADO.NET can be improved upon too (e.g. https://github.com/dotnet/runtime/issues/17446 for parameter boxing). In other words, if some aspects of ADO.NET do turn out to block perf, I'd rather this effort helped us identify and fix them, rather than doing something completely different.

However because TDS is packetized there can be multiple columns and even rows in a single packet so a single data fetch may not be async it could just be fetched from memory. At the moment in SqlClient you either have to allocate a task every time or you have to use sync and live with the thread blocking sometimes. ValueTask GetFieldValueAsync(int index) will make that problem go away.

This is a good example. If the Task allocation from DbDataReader.GetFieldValue turns out to be significant (and let's see that it's the case, rather than assuming it is), we could simply add a new ValueTask-returning API alongside the existing one. We could then promote that into ADO.NET.

Regardless of the above, I think an async-only driver definitely makes sense, at least as at a first phase (but maybe in general). Needing to support sync already creates various problems in today's world (e.g. no Pipelines support, at least last time I checked).

Wraith2 commented 3 years ago

If the Task allocation from DbDataReader.GetFieldValue turns out to be significant (and let's see that it's the case, rather than assuming it is)

Well, I think so. This is from the TE Fortunes benchmark test project that I've been using for perf improvements over the last couple of years.

And since you're a bunch of people who'll know what you're looking at here's the dotTrace file for that so you can explore. BenchmarkDb.zip

The thing we actually want is the Fortune object and the strings it contains, look how far down the list it is. I'm pretty sure that the entire query result is inside a single packet response so it's all in memory at the point when fields are being fetched. Allocating a Task for every int32 read from an in memory packet isn't the worst things happening but because of the public surface area even if I managed to make the internals ValueTask based I'd still have the allocation at the user surface. I've considered proposing ValueTask<T> ValueGetFieldValueAsync(int i) a couple of times but the name makes me have second thoughts let alone the process or approval and years it would take to get it adopted.

Anyway. You raise another good point @roji. What do we mean by high performance? The reason I started on this was that I wrote a really nicely performing server app and then the perf got destroyed by simply trying to save data to the database. I'd like to be able to plug in the database layer without it being the worst performing thing in the application.

In my experience so far query speed is dominated by network latency and we can't really do a lot about that. This is why memory makes a very small difference quite a lot of the time. What we miss in this scenario is that there will be higher layers doing other work like layout etc.

I think we're looking to get a driver which is as cpu and memory clean in as many aspects as possible so that many machine resources can be used simultaneously, so don't block threads don't waste memory causing time to be lost to GC. Get work done and get out of the way so higher layers can get higher throughput per unit hardware per time.

roji commented 3 years ago

This is from the TE Fortunes benchmark test project that I've been using for perf improvements over the last couple of years.

We're getting down into the little details, but... I can't see any reason for GetFieldValueAsync to be used in TE Fortunes (or generally a big majority of ADO.NET usage). Unless one is reading very big rows and using CommandBehavior.SequentialAccess, the row is pretty much guaranteed to be buffered in memory; after all, without SequentialAccess one can read columns in any order. In that scenario, using GetFieldValueAsync is pretty much useless, and GetFieldValue should be used instead.

But if we want to add another API just to remove the Task allocation for when huge rows and SequentialAccess are being used, then it really isn't that complicated - mostly a matter of finding the right name. I think it's a pretty low-priority scenario (if you're reading huge rows, that extra allocation isn't likely to matter), but it's trivial to push through.

Re the rest... I do hope Npgsql shows that very good perf is possible while sticking to ADO.NET.

This is why memory makes a very small difference quite a lot of the time.

I'd want to see some backing to that assumption. Of course, I don't mean egregious useless allocations inside the driver (which is what SqlClient does in some cases) - that should simply be avoided or cleaned up. But again, my recent experience with EF Core shows that hunting down every single reasonable allocation has quickly diminishing returns, and therefore probably doesn't justify dumping ADO.NET.

To summarize, .NET does have a standard database API. While it's not the best it could be, there's an entire ecosystem around it, and very good performance has been achieved without dumping it. Could a bit more perf be squeezed by saving a couple allocations that ADO.NET forces us to do? Maybe. Does that justify doing something completely non-standard, where maybe ADO.NET itself could be fixed? I think that remains to be shown.

Drawaes commented 3 years ago

I don't understand the point of this repo if you just want to improve the sql client under ado.net? You might as well work on fixing that under the hood. SslStream was a similar age, very risky, a mix of sync and async and promise based code and we fixed it from the inside. There is no point just making another ado.net sql lib with all those restrictions.

davidfowl commented 3 years ago

I'm on @Wraith2 's side, that profile shows so much waste, like nuke all of those allocations😄

bgrainger commented 3 years ago

In MySqlConnector, I've considered providing a "direct" API that exposes the raw MySQL protocol through a .NET API to see how much overhead ADO.NET contributes. However, it always seemed extremely unlikely that it would be used anywhere except the TE benchmarks due to the established ADO.NET ecosystem.

I have no objections to this project pursuing a focused SQL Server solution to see what the maximum possible performance is, but would personally appreciate the effort going into directions that could benefit all ADO.NET providers/users. But please just take this as my 2c, and not a strong attempt to change the project direction.

Some ADO.NET pain points I'd like to see fixed:

No first-class "connection pool" concept. Every ADO.NET user has to create (and throw away) a DbConnection object that behind-the-scenes borrows an actual connection from a pool (and returns it when DbConnection.Dispose is called). This results in a lot of temporary garbage objects when it would be nicer to write pool.BorrowConnection (or similar) instead. Obviously that would be a big change in how ADO.NET is consumed.
- Having a common, well-tested, low-overhead, lock-free connection pool available to use as an ADO.NET driver implementer would be very nice.
Connection Strings. Related to above, the pool is usually implicitly selected by setting DbConnection.ConnectionString. This is frequently parsed with DbConnectionStringBuilder, which adds some unavoidable overhead: https://github.com/mysql-net/MySqlConnector/issues/238. If a "connection pool" object were created, connection string parsing would only have to happen once, instead of per-DbConnection. (Drivers can do some optimisations right now based on connection string equality, of course.)
DbCommand, DbParameterCollection, etc. are a lot of ephemeral objects created for simple SQL statements. Dapper provides an ExecuteScalarAsync<T> extension method on DbConnection; it might be nice to have something similar as part of ADO.NET that could do some optimised parameter binding, execute the SQL, and return the single scalar result.
DbCommand.Prepare doesn't make a lot of sense on an ephemeral object if it's intended to create a reusable server-side object. Usually this ends up storing some state on the underlying object that represents an open server connection.

OTOH, if a lot of these (and similar) suggestions get implemented, maybe it's "not ADO.NET" anymore, and trying to preserve ADO.NET was a false goal? 😀

Drawaes commented 3 years ago

This is my 2c as well. But I see it as an opportunity to write a new fresh DB surface area that while focused on Sql Server initially there is no reason there couldn't be a MySql etc version. Connection pooling is a very good example that I think shoe horning into existing ado.net would extra pain.

I would love to see a break from tradition of the example of pipelines vs streams. Initially built on their own and eventually bridging code to help you use pipelines with stream apis. But for pure perf you want pipelines all the way through.

roji commented 3 years ago

To be clear, I'm not the boss here or anything - am just stating my opinions!

I don't understand the point of this repo if you just want to improve the sql client under ado.net? You might as well work on fixing that under the hood. SslStream was a similar age, very risky, a mix of sync and async and promise based code and we fixed it from the inside. There is no point just making another ado.net sql lib with all those restrictions.

There are various reasons why this isn't feasible - lots of backwards-compatibility, legacy and stability constraints simply don't make this possible. As a very simple example, SqlClient needs to continue supporting .NET Framework.

But I see it as an opportunity to write a new fresh DB surface area that while focused on Sql Server initially there is no reason there couldn't be a MySql etc version.

So there really are two efforts being discussed here:

Writing a new, modern and efficient driver for SQL Server; this indeed resembles rewriting SslStream while retaining the same public API. There are indications that this would be of immense value in the ecosystem - and this is the original/main thing discussed here.
Writing a new .NET database API, as an alternative to ADO.NET.

I see these as two very separate and orthogonal efforts, each being quite huge. For example, discussing a first-class, cross-database pooling component as @bgrainger mentioned (see https://github.com/dotnet/runtime/issues/24856) is just on its own a potentially huge debate. I believe having focus on one of the above two is important (trying to tackle too frequently backfires), but again, that's just my opinion.

I'm guess I'm also a bit confused on why we're proposing to do a new database API. Are we convinced ADO.NET is a significant blocker to database driver perf, and if so, that we can't fix those problems incrementally? Do we have any data to support that? Or do we just hate the API and want a modern, new API (also legitimate)? Or do we just want to play around without any constraints (also legitimate)? The important point is that any new DB API breaks the ecosystem in both directions - DB drivers would need to be rewritten to support it (Npgsql, MySqlConnector, Oracle), and upper layers would have to do the work to support it as well (EF Core, Dapper, LLBLGen Pro...). The effort would be immense - and I'm not clear on exactly why we'd be proposing that.

@bgrainger FWIW I agree with a lot of your points above, though some things can be handled without API changes. For example, Npgsql does avoid parsing the connection string more than once; DbCommand instances can be recycled by their owning DbConnection, and similarly DbDataReader can be recycled by their owning DbCommand (Npgsql does these things); we can possibly do something similar

roji commented 3 years ago

But to reiterate - I'm just a guy expressing my opinions. If we end up with a non-ADO.NET driver, and an ADO.NET shim over that, maybe that's fine too.

roji commented 3 years ago

I'm on @Wraith2 's side, that profile shows so much waste, like nuke all of those allocationssmile

FWIW I'm pretty sure 90+% of these allocations are the SqlClient implementation, rather than anything necessarily having to do with ADO.NET.

mgravell commented 3 years ago

and doing a non-ADO.NET driver basically means it's unusable by all existing .NET layers (Dapper, EF, LLBLGen Pro...)

FWIW, the resurrection of this thread a few days ago is part of what prompted me to get started on DapperAOT. Pretty sure I'll have code working in the flavor of Dapper (although using the new DapperAOT) in lock-step with any-and-all progress here.

Wraith2 commented 3 years ago

The current SqlClient codebase is too dangerous to use a "move fast and break things" approach. The number of users and the fact that we want people to move from System to Microsoft versions requires a slow and measured approach. The unfortunate side effect of that is that the speed of change is slow. I've been tinkering for two years and that trace is still abysmal compared to modern code. It is also important to note that the codebase is difficult to work with.

The existing SqlClient library will not go away. It will not change with sufficient speed to improve performance for the majority of users. I'm fairly sure that the performance of SqlClient lags behind that of other database providers, even if it doesn't I'm sure there can be a better performing implementation.

In writing an SqlCore library we can: 1) identify if there are new patterns that would be better than existing ADO approaches for high throughput, low overhead, high concurrency, etc scenarios 2) provide a good non-surprising (no async over sync) async way to access Sql Server data 3) provide a viable mitigation for existing problems with high string and binary use scenarios

I think all of these things have value.

Things that aren't current goals as I see it: 1) replace or deprecate ADO, just not practical nor demonstrably of value 2) create an ADO capable SqlCore driver 3) create a driver which implements the entire breadth of the existing SqlClient driver (SqlNotification? not now).

We do not have to choose to ignore the existing ADO api surface because we can probably provide most of it built on a new implementation. This doesn't have to be an explicit goal but it may be something to be aware of as a long term nice-to-have so we can flag up changes which would prevent it being possible.

The existing ADO surface has problems such as Task<T> GetFieldValue<T>(int i) which can be worked around trivially with new implementation build using ValueTasks where it is appropriate. So perhaps we should work on a new implementation regardless of the api shape and then when we have something that works for trivial cases review what we've got and discuss further what direction we should take?

I'm happy with working towards specific use cases from StackOverflow if they can be provided. It's a point to work towards and learn about the problem space. Once we have something that is of worth or we collectively decide that the goal is impossible or improbable we can re-assess.

I don't see any problem with the general pattern of access which ado provides. As a sketch

public IAsyncEnumerable<Fortune> GetFortunes()
{
    await using (var connection = GetSqlCoreConnection())
    {
        using (var command = GetSqlCoreCommand(connection, command properties ?))
        {
            using (var reader = await command.GetReader(reader properties))
            {
                while (await reader.GetNextRow())
                {
                    yield return new Fortune { Id = await reader.GetValue<int>(0), Message = await reader.GetValue<string>(1) };
                }
            }
        }
    }
}

but that doesn't mean we have to start off with implementing all the interfaces that are in the System.Data namespace to do it.

Wraith2 commented 3 years ago

I'm on @Wraith2 's side, that profile shows so much waste, like nuke all of those allocationssmile

FWIW I'm pretty sure 90+% of these allocations are the SqlClient implementation, rather than anything necessarily having to do with ADO.NET.

Yup, but the Task<int> and Task<string> are directly due to Task<T> GetFieldValue<T>(int i) not being ValueTask. You should always use the async call when getting a string because it can easily cross a packet boundary but if you do that you pay for the task even if the string is in-memory and doesn't cross a packet boundary.

I'm on @Wraith2 's side, that profile shows so much waste, like nuke all of those allocations😄

Working on it, slowly... This is the good version, it used to gc every 16ms when I started. The waste is, as @roji said, largely due to implementation issues but it will take me another 2 years at least to get to make a dent in them. I'm going to do that work but while I do we can also investigate better ways to approach the functionality required with modern techniques. Things learnt on both projects may improve the other.

roji commented 3 years ago

One more possible idea... If we don't want to be constrained by ADO.NET, but also aren't aiming to create a new abstraction, we could just write a driver with a public API surface that fits SQL Server, TDS, etc. This would be a low-level API which doesn't pretend or aim to be a general database API in any way (and once again, a shim could be built on top of that). It's all really a question of what we're trying to achieve here.

Wraith2 commented 3 years ago

I think that would be a good place to start.

davidfowl commented 3 years ago

Great suggestion @roji !

ajcvickers commented 3 years ago

Bottom line is that we need to measure this. If ADO.NET is adding significant overhead, then we either address that or don't ultimately use ADO.NET. Both of these options are on the table.

However, our current .NET/PostgreSQL solution on TechEmpower is #12 overall and uses ADO.NET. Analysis of this (which @roji has done extensively) indicates that, at least for queries, ADO.NET is not limiting. We have also found this with other experiments done in the past. So even though a few years ago my gut feeling was that we would have to drop ADO.NET, I now think I was wrong about that because the data we have just doesn't show it.

The idea behind issue #12 is to get the upper limit. Once we have that we can put that code into ADO.NET and get an initial read of the overhead. Until we have that data I think we're all guessing.

ajcvickers commented 3 years ago

@Wraith2 I should point out again that the goal of this project is not to replace Microsoft.Data.SqlClient. Rather it is to compliment it with a fast driver that most likely doesn't support many of the features of Microsoft.Data.SqlClient. Microsoft.Data.SqlClient will always be an option for those who need ADO.NET or backwards compatibility.

kant2002 commented 3 years ago

When playing with NativeAOT and source generators, I would like provide some feedback

I notice that DataReader when read from TDS read data in native format and place them then in object[] and when typed versions of GetValue read from that object, cause boxing/unboxing. I'm not sure that I want direct TDS access, but maybe that would work for my scenario
SqlConnection utilizes a lot of different scenarios which not everybody need in specific application case. Maybe would be possible have some API which can control what code used using API, not Connection String. For example you should opt-in into connection pooling, azure, local instances, AD support, etc.

Hopefully this is not far of the goals of this project. just my 2c

Wraith2 commented 3 years ago

I notice that DataReader when read from TDS read data in native format and place them then in object[] and when typed versions of GetValue read from that object, cause boxing/unboxing. I'm not sure that I want direct TDS access, but maybe that would work for my scenario

This is an old .net 1.1 api and because it uses object[] the boxing of valuetypes is unavoidable. Direct access through GetFieldValue<T> should not box for any supported value type. I don't expect GetValues to be something we would use in a high perf scenario.

kant2002 commented 3 years ago

Thanks for explanation. Will go home, do my homework then.

mgravell commented 3 years ago

@Wraith2 another consequence of old APIs: the entirity of the async API is Task[<T>] based; for the non-T versions, and the Task<bool> versions (ReadAsync(), NextResultAsync(), etc), that's tolerable - they can be optimized in the "woot, we had the value synchronously" scenario, but the field-based async API is horrible for allocations (GetFieldValueAsync<T>() etc) - to the point where I'm pretty sure most consumers just ignore it, and just use ReadAsync(), then switch to synchronous inside a row. Which might actually be fine, in most scenarios - although ironically, the 2 APIs where you might genuinely expect to need to wait for data - GetChars and GetBytes - don't offer an async API :p

roji commented 3 years ago

@mgravell I agree that GetFieldValueAsync (and IsDBNullAsync) are problematic in that they return Task (and that's indeed been mentioned as a problem in ADO.NET that we could easily fix by introducing an alternative API (and the lack of something like GetCharsAsync/GetBytesAsync too).

Though you're also right that in most cases, sync column access seems to be the norm as the row is assumed to be already buffered in memory (when CommandBehavior.SequentialAccess isn't specified). This is how Npgsql works - without SequentialAccess, entire rows are pre-buffered in memory when Read/ReadAsync is called, allowing subsequent column access to be very memory-only and very fast. But if we want to improve support for large rows, there's definitely some stuff missing.

(I'd just rather we distinguished between missing/problematic APIs which we can incrementally fix by adding new ones (if we feel like we have to), and an approach where we dump ADO.NET from the get-go)

mgravell commented 3 years ago

@roji you're right, the "do we don't we" on ADO.NET is the big question. Unfortunately, it is one of those places where you kinda need a minimal "this is what it could do" minimal feature set comparison to get an idea of how much impact the ADO.NET API is having. It seems like a minimal viable API for talking to a database is still a pretty big API, even if it is just "run this non-parameterized query that returns 1 row of 4 columns". I think until we have a baseline there, we could talk in circles, and: to directly address your concern - we could be trying to solve the wrong problem.

roji commented 3 years ago

@mgravell sure, you're right. I've been thinking we'd take something like the TechEmpower Fortunes scenario as our first, minimal viable surface area: simple querying bringing back ints and strings, no parameters (yet), no transactions and definitely no exotic ADO.NET stuff nobody wants. That doesn't seem huge to me - in ADO.NET it would mean a basic DbConnection that can open/close (and a trivial pooling implementation), a DbCommand with ExecuteReaderAsync and a DbDataReader with ReadAsync and GetFieldValue.

mgravell commented 3 years ago

@roji that would give us an indication of what the gap is between what SqlClient currently does, and what SqlClient (or a replacement) could do, inside the limits of ADO.NET; however, it doesn't give us any indication of what the baseline is for what a replacement could do without the constraints of ADO.NET. So if someone is taking on the challenge of an MVP of SqlClient, then the perfect outcome would be to have those exact same implementations available in a non-ADO.NET wrapper.

Drawaes commented 3 years ago

I would imagine that something like the following should be reasonably achievable in short order, the only problem I have is the code is a bit dates (around spans and pipelines now as the API shifted) and that ... well SslStream isn't my friend

https://gist.github.com/Drawaes/27311bcee54e3ef524823f1c9bed7179

roji commented 3 years ago

it doesn't give us any indication of what the baseline is for what a replacement could do without the constraints of ADO.NET

That's a very abstract sentence, as if we were dealing with an unknown, amorphous API :rofl:

We all know ADO.NET quite well here, and we know that for a Fortunes-style experiment, the perf issues may be the allocations for the 3 types (DbConnection/DbCommand/DbDataReader, though recent experience shows a few short-lived allocations simply don't have that much impact). Npgsql already mitigates the last two allocation via recycling: disposing NpgsqlDataReader stores the instance on its NpgsqlConnection for the next ExecuteReader, and disposing NpgsqlCommand stores the instance to be returned for the next NpgsqlConnection.CreateCommand. DbConnection remains the nonun-recycled one; in EF we don't have that issue, since we pool DbContexts which themselves own a DbConnection, so all ADO.NET objects are cached all the way down. For non-EF this could also be mitigated by exposing a pooling API which would return DbConnection instances, rather than requiring users to new them up (https://github.com/dotnet/runtime/issues/24856).

So if we really want to argue against ADO.NET, it would be nice to have a concrete discussion rather than general "we don't know what perf we could achieve if we just freed ourselves of all constraints". Or we could go through the exercise of implementing an ADO.NET and a non-ADO.NET Fortunes prototype, which IMHO would most likely show that the single DbConnection allocation (which itself can be mitigated) isn't worth creating a whole new DB API for.

roji commented 3 years ago

@Drawaes yeah, that looks like a good start (I'd definitely forget about SslStream in a first minimal prototype...). Do you already have something establishing the connection and performing that query against SQL Server?

Drawaes commented 3 years ago

You can't forget about SslStream, is kinda the point, you need TLS in the SQL protocol

roji commented 3 years ago

@Drawaes are you saying it's not possible to communicate with SQL Server at all without TLS? I was just suggesting excluding TLS in initial prototyping, assuming that's possible.

Wraith2 commented 3 years ago

@Drawaes are you saying it's not possible to communicate with SQL Server at all without TLS? I was just suggesting excluding TLS in initial prototyping, assuming that's possible.

Only local servers. Any Azure server requires tls. Practically speaking tls is a hard requirement for any non-toy implementation.

roji commented 3 years ago

I think setting up a working prototype against a local server can be a good starting point (for simplicity, we probably have enough work), but if you guys want to tackle TLS immediately I'm not objecting...

mgravell commented 3 years ago

Just saying... if TLS is the problem and the purpose is to assess the impact of the ADO.NET layer, it doesn't actually need to be SQL Server at the bottom. A "raw PostgreSQL vs PostgreSQL in ADO.NET" might serve just as well to give an indication of what the slack is that we're playing with from the ADO.NET layer :) (and no, that's not meant as a "so @roji can go and do it all")

Drawaes commented 3 years ago

The problem with ignoring TLS is the way TLS is interleaved with TDS

It is better to know how you are going to put it in from day one (you have to handshake, hand off to TLS, and then hand back to TDS)

Wraith2 commented 3 years ago

Requiring TLS influences the way all IO is performed. Without the need for it we can go pure pipelines over a socket and have really nice code, with it we have to go up to SslStream. We know we will need SslStream so we should start there. It'd be really nice if we had an SslSocket but we don't...

Drawaes commented 3 years ago

As someone who advocated for a TlsPipeline for years, even wrote a demo one at the start of pipelines, I whole heartly agree it would be nice

https://github.com/dotnet/corefxlab/pull/991

https://github.com/dotnet/corefxlab/pull/1062

Wraith2 commented 3 years ago

Aside from a lot of nit-picking it looks like it was generally a viable approach. So perhaps something to advocate for in .net7 or beyond?

mgravell commented 3 years ago

Without the need for it we can go pure pipelines over a socket and have really nice code

last time I checked, we still didn't have a supported client pipelines API (although David has teased me with the idea a few times); another pet gripe of mine, hence the intentionally passive-aggressive name of "Pipelines.Sockets.Unofficial", but my point is: (unless things have changed and I missed it) you might not even get that out-of-the-box

Wraith2 commented 3 years ago

To pick up on the main discussions: Is it a reasonable idea to implement a "trivial" client akin to the fortunes query with a similar api shape ( connections, commands, readers) but without sticking to the ado interface directly (so we could have ValueTask<bool> ReadNextRow()) and see what that takes and where it gets us?

Wraith2 commented 3 years ago

Without the need for it we can go pure pipelines over a socket and have really nice code

last time I checked, we still didn't have a supported client pipelines API (although David has teased me with the idea a few times); another pet gripe of mine, hence the intentionally passive-aggressive name of "Pipelines.Sockets.Unofficial", but my point is: (unless things have changed and I missed it) you might not even get that out-of-the-box

From what I've seen if we can demonstrate a need for those official apis by making aspnetcore capable of being faster and throw in a TE benchmarks buzzword then it's a no-brainer as far as getting the change into net7.

dotnet / datalab

SqlServer.Core: Performance-oriented SQL Server .NET driver #6