Measure impact of allocation free async

.NET Core 2.1 is going to bring new changes to ValueTask<T> that are going to enable it to wrap simple awaitable objects that can be reused across multiple async operations:

https://github.com/dotnet/corefx/issues/27445

This is a good fit for things like reading rows asynchronously from a DbDataReader (and all the layers above, including anything that implements IAsyncEnumerable).

This issue is about trying to do some experiments to measure the potential impact of this. E.g. we could modify Peregrine to leverage this in a completely allocation free async implementation and compare.

Having an idea of the impact can help us prioritize the changes and also decide how soon we should do it: we need to decide on public surface changes. E.g. do we just add a new alternative to ReadAsync that return ValueTask (and similar GetFieldValueAsync) or do we wait until the new IAsyncEnumerable<out T> is baked and create something based on it?

I'll be back to doing perf work in a few days and will definitely take a look at this for Npgsql. Any idea whether this works only for .NET Core 2.1 or also for older targets?

Not sure if it would work with previous versions (we can ask Stephen). Perhaps it would be ok to add public surface that exposes a ValueTask<T> but isn't optimized unless you are using 2.1 or greater.

On the other hand we have the restriction that DbDataReader and friends are part of .NET Standard 2.0, so we need to come up with some way to add APIs that doesn't change that. I am again considering the idea of creating a new package that defines optional interfaces.

Reading @stephentoub's comment it would seem this has also been added to the System.Threading.Tasks.Extensions nuget, so if I'm understanding correctly this is actually backwards compatible.

Even if we don't add/change any surface API, there's potential for reducing allocations internally in the driver. However, it's true that in the end we have DbCommand.ExecuteReader() which returns a Task<DbDataReader> and always yields, so a new API returning a ValueTask would indeed eliminate that. But let me see where I can get inside the driver first.

if I'm understanding correctly this is actually backwards compatible

Sounds good!

in the end we have DbCommand.ExecuteReader() which returns a Task and always yields

I assume this is for prepared statements and those don't do any database I/O until the first read? Then yes, that sounds like a good usage of ValueTask<T>... unless you are returning a cached/pooled DbDataReader instances in which case you could make more sense to cache the entire completed Task<DbDataReader> instance directly.

Anyway, I tend to focus more on the more fine-grained async APIs like ReadAsync and possibly GetFieldValueAsync, because I believe those can pay off more in terms of reducing allocations.

in the end we have DbCommand.ExecuteReader() which returns a Task and always yields

I assume this is for prepared statements and those don't do any database I/O until the first read? Then yes, that sounds like a good usage of ValueTask... unless you are returning a cached/pooled DbDataReader instances in which case you could make more sense to cache the entire completed Task instance directly.

Well, regardless of whether it's prepared or not, DbCommand.ExecuteReader() has to both write to the server and read the first response. If we can reduce allocations in this process that would be great.

Anyway, I tend to focus more on the more fine-grained async APIs like ReadAsync and possibly GetFieldValueAsync, because I believe those can pay off more in terms of reducing allocations.

Unlike DbCommand.ExecuteReader(), which always does I/O (always yields), APIs like ReadAsync() and GetFieldValueAsync() don't. In fact, GetFieldValueAsync() absolutely never yields unless CommandBehavior.Sequential is specified - we know the entire row is buffered in memory. So the implementation of ReadAsync() is totally non-async. The same is mostly true of ReadAsync() - in the 99% case, the next row is already buffered in memory and no I/O needs to occur; I/O only needs to occur if reading a really big resultset (> 8K) and we're now at the end of the buffer - that case doesn't seem worth optimizing...

So to summarize, I think the finer-grained async APIs are better optimized via a fast-path for when they don't yield, thus avoiding async altogether (I did this when I was visiting). The problem is with the methods which always yield, i.e. ExecuteReader(), and for those I definitely think there's something to do...

regardless of whether it's prepared or not, DbCommand.ExecuteReader() has to both write to the server and read the first response

Ah, I misread what you said before. I thought you were saying it can always return immediately, but you said the opposite. If the awaitables can be safely recycled, then it could be a good fit for ValueTask, otherwise it probably isn't.

GetFieldValueAsync() don't. In fact, GetFieldValueAsync() absolutely never yields unless CommandBehavior.Sequential is specified

Yes, I know :smile: but maybe we care enough for the cases in which it yields, otherwise we can deprecate the API :smile:

The same is mostly true of ReadAsync() - in the 99% case, the next row is already buffered in memory and no I/O needs to occur; I/O only needs to occur if reading a really big resultset (> 8K) and we're now at the end of the buffer - that case doesn't seem worth optimizing...

I want to understand this one better. Are you saying that in Npgsql you read the first buffer ahead of the first call to ReadAsync in order to avoid ever having to yield? Otherwise I would expect to still see on average more than one calls to ReadAsync that yield for every call to ExecuteReaderAsync. And of course if the resultset is bigger than 8K (or whatever the size of the buffer is for a given provider), then the ratio will be 2:1 or greater.

BTW, I wouldn't say that a resultset that is bigger than 8K is really big :trollface:

I think the finer-grained async APIs are better optimized via a fast-path...

To clarify, the work on allocation free-async being discussed at https://github.com/dotnet/corefx/issues/27445 is actually supplemental to any fast path optimizations. In other words, it assumes that you are already avoiding allocations for the cases in which you don't need to yield, and it provides the cure to the problem of having to allocate a new Task for each case in which you do need to yield. The trick is to return an awaitable that unlike Task, can be reused, e.g. the next time you need to yield. For this reason we are discussing that in the IAsyncEnumerable proposal, IAsyncEnumerator<T>.MoveNextAsync() will be updated to return ValueTask<bool>.

regardless of whether it's prepared or not, DbCommand.ExecuteReader() has to both write to the server and read the first response

Ah, I misread what you said before. I thought you were saying it can always return immediately, but you said the opposite. If the awaitables can be safely recycled, then it could be a good fit for ValueTask, otherwise it probably isn't.

Yeah, we discussed this with @anpete at some point with regards to a slightly different question - is it safe to have a single instance of NpgsqlDataReader on a physical connection, and just return that whenever ExecuteReader() is called (remember that Npgsql doesn't support MARS - only one query can be executing at any gievn point). If we want to be very strict, then maybe we shouldn't do that, since DbDataReader is disposable and if the user is writing really bad code then this might break things. However, we ended up thinking that for DbDataReader (unlike for DbCommand and DbConnection) it's safe to recycle. The Npgsql perf branch (on which we ran the benchmarks) already included this optimization.

So that reduces one allocation per ExecuteReader() - the reader itself. But the ExecuteReader() API returns a Task<DbDataReader>, which means another allocation for the Task - and this allocation could be eliminated by adding a ValueTask-based overload on DbCommand. This way we recycle not only the reader instance, but also the Task.

However, I'm not too enthusiastic about this idea because the optimized pattern in https://github.com/dotnet/corefx/issues/27445 is a bit tricky for consumers of the ValueTask - by its very nature it's an error to await the ValueTask twice, etc. so I'm thinking this would be the source of many user errors - and weighing that against the benefit of a single saved Task allocation, I'm not sure it's worth it. It's also a addition of a new public API, which isn't free either.

So I really think this is great for any async happening inside Npgsql - remember that there are still 2-3 async methods being called internally, and probably allocating - but I'm not too sure about exposing this to the user.

GetFieldValueAsync() don't. In fact, GetFieldValueAsync() absolutely never yields unless CommandBehavior.Sequential is specified

Yes, I know smile but maybe we care enough for the cases in which it yields, otherwise we can deprecate the API smile

It's a good question... So first, although Npgsql buffers entire rows unless in sequential mode, I'm not sure all other providers do... In other words, in theory a provider might still decide to go to the server even without CommandBehavior.Sequential. The only thing the ADO.NET contract specifies is that it must be legal to get columns in any order, but that doesn't exactly mean the entire row needs to be buffered at the client. So even if for Npgsql calling GetFieldValueAsync() makes sense only with CommandBehavior.Sequential, that might not be the case for all providers.

Second, I'm assuming that cases where sequential is specified and where GetFieldValueAsync() needs to do I/O are pretty rare. If I'm not mistaken, Dapper systematically executes with sequential because it traverses columns once and in order (and there's the assumption that sequential might be faster). However, even in that case results are going to be buffered in 99% of cases, and GetFieldValueAsync() will never do I/O. The only case where GetFieldValueAsync() will yield is with really big rows which don't fit in the buffer.

So to summarize, I definitely think GetFieldValueAsync() has its uses and shouldn't be deprecated. however, while it's possible to optimize GetFieldValueAsync() and save its task allocation, cases where it actually yields are probably so rare that it's not really worth it, IMHO... At the very least this should be pretty low on the priority list of things to optimize.

The same is mostly true of ReadAsync() - in the 99% case, the next row is already buffered in memory and no I/O needs to occur; I/O only needs to occur if reading a really big resultset (> 8K) and we're now at the end of the buffer - that case doesn't seem worth optimizing...

I want to understand this one better. Are you saying that in Npgsql you read the first buffer ahead of the first call to ReadAsync in order to avoid ever having to yield? Otherwise I would expect to still see on average more than one calls to ReadAsync that yield for every call to ExecuteReaderAsync. And of course if the resultset is bigger than 8K (or whatever the size of the buffer is for a given provider), then the ratio will be 2:1 or greater.

Sure, let me provide more detail... Whenever Npgsql needs more data, at attempts to read as much as possible, up to the buffer size (8K) - so yes, it's a sort of read-ahead buffer. So let's say the user called ReadAsync() and there's no more rows in the buffer. The call to ReadAsync() will yield, but the buffer will potentially be filled with the the next 8K of rows, is that subsequent calls won't yield again.

Now, the important thing is that unless I'm msitaken, PostgreSQL actually doesn't send anything back as a response until it actually has at least one row to send. So when you execute ExecuteReader(), it always yields, reading some control messages (i.e. BindComplete), it will also buffer at least some data rows in the process as well, and the first ReadAsync() won't need to yield. In fact, IIRC when executing the fortunes benchmark the whole query results came as a single "batch" (BindComplete, DataRow, CommandComplete, ReadyForQuery) so apart from the initial ExecuteQuery() yield there was no need to do any more I/O.

Everything I wrote above does need to be verified - I'll try to take a look and confirm in the coming days. It's true that this would be PostgreSQL-specific behavior, and with other databases the ReadAsync() may indeed need to yield - and so removing its allocation would have value there. But again keep in mind the tricky nature of the optimized value task in https://github.com/dotnet/corefx/issues/27445 - if I'm understand things correctly this will probably be the source of a lot of issues if exposed to users.

BTW, I wouldn't say that a resultset that is bigger than 8K is really big :trollface:

That's true :) But the point is that you'd have just one yielding invocation every 8K. I think there's a lot of value in eliminating allocations that occur on every invocation, but much less value in eliminating those that happen only when the buffer is exhausted... The gains are likely to be really dominated by processing of the 8K of data at the user, etc.

I think the finer-grained async APIs are better optimized via a fast-path...

To clarify, the work on allocation free-async being discussed at dotnet/corefx#27445 is actually supplemental to any fast path optimizations. In other words, it assumes that you are already avoiding allocations for the cases in which you don't need to yield, and it provides the cure to the problem of having to allocate a new Task for each case in which you do need to yield. The trick is to return an awaitable that unlike Task, can be reused, e.g. the next time you need to yield. For this reason we are discussing that in the IAsyncEnumerable proposal, IAsyncEnumerator.MoveNextAsync() will be updated to return ValueTask.

I agree. The question is more whether invocations yield often enough to make https://github.com/dotnet/corefx/issues/27445 worth it (especially considering that it isn't "free", because of the gotchas in the API)... In the case of ReadAsync() and GetFieldValueAsync() for Npgsql, I think that the answer may be no - but of course I may be wrong.

In any case it definitely seems like although they're orthogonal, there's much higher value in optimizing the non-yielding invocation (i.e. fast-path) than the yielding invocation, so provider should probably focus on the former first.

Wow, sorry for the super-long response :)

by its very nature it's an error to await the ValueTask twice, etc. so I'm thinking this would be the source of many user errors

I was wondering about this myself, and I have to say: I have never seen any normal code await anything more than once. I am actually optimistic that most developers will never see the difference. Perhaps the risk is being overestimated, but I guess we can wait until others try this and see how confusing it is. For the case of IAsyncEnumerable the fact that the consuming pattern may be provided by libraries or even the compiler (the async version of foreach), I think mitigates the risk significantly.

So I really think this is great for any async happening inside Npgsql - remember that there are still 2-3 async methods being called internally, and probably allocating

Cool. I agree the benefit here can be huge.

So first, although Npgsql buffers entire rows unless in sequential mode, I'm not sure all other providers do...

SqlClient does the same thing. This was originally a requirement from EF, because it allows us to keep our materializer code synchronous and still avoid blocking.

calling GetFieldValueAsync() makes sense only with CommandBehavior.Sequential, that might not be the case for all providers

I think it is desirable that this is the case for all providers. However, to the degree that we still care about this API (I am willing to agree that this is debatable, but hey, I think I may be biased towards O/RM scenarios :smile:), it should ideally return ValueTask and not Task. This is not just because of the new allocation-free awaitables work. It was already true before, for the simple reason that this is an API that most of the time doesn't yield, but the result can vary a lot and therefore it is not practical to cache the returned Task<T>. I filed https://github.com/dotnet/corefx/issues/15011 more than a year ago for this reason.

when you execute ExecuteReader(), it always yields, reading some control messages (i.e. BindComplete), it will also buffer at least some data rows in the process as well, and the first ReadAsync() won't need to yield.

Cool, that is great! I am also not sure about what other providers/databases do.

I think there's a lot of value in eliminating allocations that occur on every invocation, but much less value in eliminating those that happen only when the buffer is exhausted...

I agree that the fast path optimization for the cases that don't need to yield can have more value. It is just unfortunate that we are not doing this everywhere already! FWIW, my intention when I filed this issue was to push ourselves to go to the next level.

BTW, do you remember when @anpete you and I discussed the Task<T> allocations that were showing up the memory profiles over the database tests? I remember that there were both Task<DbDataReader> from ExecuteReaderAsync and Task<bool>. Do you remember what the ratio was and where the `Task were coming from?

However, I'm not too enthusiastic about this idea because the optimized pattern in dotnet/corefx#27445 is a bit tricky for consumers of the ValueTask - by its very nature it's an error to await the ValueTask twice, etc. so I'm thinking this would be the source of many user errors - and weighing that against the benefit of a single saved Task allocation, I'm not sure it's worth it. It's also a addition of a new public API, which isn't free either.

You can guard against that in the implementation of the IValueTaskSource

I think it is desirable that this is the case for all providers. However, to the degree that we still care about this API (I am willing to agree that this is debatable, but hey, I think I may be biased towards O/RM scenarios smile), it should ideally return ValueTask and not Task. This is not just because of the new allocation-free awaitables work. It was already true before, for the simple reason that this is an API that most of the time doesn't yield, but the result can vary a lot and therefore it is not practical to cache the returned Task. I filed dotnet/corefx#15011 more than a year ago for this reason.

Oh cool, I wasn't aware of this issue. I definitely think that GetFieldValueAsync<T> makes sense (and shouldn't be deprecated), and I also agree that an overload which returns a ValueTask make sense.

By the same logic, it might make sense to have a ReadAsync() which returns a non-generic ValueTask, to reduce allocations for the cases in which it yields.

BTW, do you remember when @anpete you and I discussed the Task allocations that were showing up the memory profiles over the database tests? I remember that there were both Task from ExecuteReaderAsync and Task. Do you remember what the ratio was and where the `Task were coming from?

I admit I don't remember any more... I definitely need to redo the memory profiling soon and will report on the results. If we see a Task coming out of ReadAsync(), that would mean it's yielding contrary to what I wrote above, and then a ValueTask-returning overload would mitigate that. I'll try to investigate soon.

However, I'm not too enthusiastic about this idea because the optimized pattern in dotnet/corefx#27445 is a bit tricky for consumers of the ValueTask - by its very nature it's an error to await the ValueTask twice, etc. so I'm thinking this would be the source of many user errors - and weighing that against the benefit of a single saved Task allocation, I'm not sure it's worth it. It's also a addition of a new public API, which isn't free either.

You can guard against that in the implementation of the IValueTaskSource

OK thanks :) I've yet to dive into the new ValueTask capabilities, will take a look.

Unlike DbCommand.ExecuteReader(), which always does I/O (always yields), APIs like ReadAsync() and GetFieldValueAsync() don't.

Likewise, in MySqlConnector, DbCommand.ExecuteReader always does I/O and is unlikely to benefit from a "regular" ValueTask (compared to Task<T>). I'd need to do more investigation on reusing/pooling objects to say if there might be a benefit to using IValueTaskSource. (FWIW, a year ago I wrote some experimental code to try to reduce the overhead of ValueTask and I think a lot of it could be adapted to IValueTaskSource for library internals, so there may be a possibility here.)

ReadAsync might do I/O, or it might not. MySQL Server returns one MySQL packet per row in the result set. These m MySQL packets are bundled into n TCP packets (could be many-to-one for small rows; one-to-many for large rows). If the underlying Socket has buffered data, then ReadAsync may be able to synchronously return the next row from that buffered data, and ValueTask<bool> ReadAsync would be an improvement.

GetFieldValueAsync is always synchronous in MySqlConnector since all field values are always sent over the network and are available in memory (because MySqlConnector always loads the entire row). (They will not be parsed into an int or byte[] unless that field is read by GetValue etc. but the raw data is always present.) CommandBehavior.SequentialAccess currently has no effect on MySqlConnector. (I think I can see how it could be implemented, but I think it would dramatically increase the complexity of the code for arguably minimal benefits.)

Similarly, IsDBNullAsync is always synchronous (and should be avoided in MySqlConnector).

ReadAsync might do I/O, or it might not. MySQL Server returns one MySQL packet per row in the result set. These m MySQL packets are bundled into n TCP packets (could be many-to-one for small rows; one-to-many for large rows). If the underlying Socket has buffered data, then ReadAsync may be able to synchronously return the next row from that buffered data, and ValueTask ReadAsync would be an improvement.

This is the same situation as in PostgreSQL. Note that the use of ValueTask to reduce memory overhead in this case isn't necessary... async methods automatically return a cached copy of Task if they complete synchronously: there's no advantage in returning ValueTask for that. Npgsql has a fast-path for ReadAsync for those cases where the next message (MySQL packet in your case) is already buffered in memory. Note that this fast-path isn't even an async method, to avoid the CPU overhead of the state machine - it's just a simple Task<bool>-returning function that checks whether everything is available in memory and returns pre-allocated true/false tasks. If I/O actually needs to occur, it calls the long, async version instead.

To summarize, before https://github.com/dotnet/corefx/issues/27445 having a ReadAsync overload that returns a ValueTask is useless (unless I'm missing something), but it may now be interesting to see whether allocations could be reduced for yielding (asynchronous) invocations.

GetFieldValueAsync is always synchronous in MySqlConnector since all field values are always sent over the network and are available in memory (because MySqlConnector always loads the entire row). (They will not be parsed into an int or byte[] unless that field is read by GetValue etc. but the raw data is always present.) CommandBehavior.SequentialAccess currently has no effect on MySqlConnector. (I think I can see how it could be implemented, but I think it would dramatically increase the complexity of the code for arguably minimal benefits.)

Yeah, I understand that. Npgsql does impoement CommandBehavior.SequentialAccess, and it's definitely not trivial to do. However, when dealing with large rows (think about binary data, long strings) it does make sense to not buffer the entire room, and at that point GetFieldValueAsync and IsDBNullAsync make a lot of sense. This definitely isn't the 90% usage case when accessing databases, but it does happen.

In any case, as long as you don't actually implement sequential row access, it makes total sense for GetFieldValueAsync and IsDBNullAsync to simply call their sync counterpart and return Task.Completed.

Note: filed https://github.com/dotnet/corefx/issues/27682 to track adding ValueTask-returning overloads to ADO.NET.

@roji and @bgrainger I'm super interested in pulling apart the networking layer of both clients to see what it looks like with all of the new primitives we are building (pipelines, span etc)

/cc @KrzysztofCwalina

@davidfowl and I'd be super super happy to do it with you... I've already thought of some places where spans could be great - although nothing earth-shattering at the moment (I'd be happy to be shown I'm wrong though). We also briefly discussed pipelines together a few weeks back and didn't necessary a huge benefit, although I admit I'm still unclear on what the tech does.

How would you to proceed? If you want to dive into the code please take a look at the perf branch which hasn't yet been merged but has lots of important stuff. If you want to do a call or a chat about it just say when.

@davidfowl What's the status of TLS for pipelines? (I've seen some tweets and merged PRs but I'm not sure if there's, say, a beta-quality NuGet package providing TLS.) MySqlConnector will need at least support for client SSL certificates to move 100% to pipelines (instead of having one path that uses SslStream for TLS connections and a different path that uses a pipe for non-TLS connections).

If you want to do a call or a chat about it just say when.

Ditto.

@roji

@davidfowl and I'd be super super happy to do it with you... I've already thought of some places where spans could be great - although nothing earth-shattering at the moment (I'd be happy to be shown I'm wrong though)

Great! Next week is the MVP summit and we're still plowing through a bunch of API changes so I don't recommend you try to jump in using them right away. By preview2, the API churn should be over. I also don't think replacing the code you have with pipelines, span and memory will necessarily make things faster if you have already optimized code paths.

We also briefly discussed pipelines together a few weeks back and didn't necessary a huge benefit, although I admit I'm still unclear on what the tech does.

Pipelines themselves don't do much. Kestrel's transport layer is something that would be potentially useful (even though it's tied to the server, it has all of the pieces required to make a client API work). Buffer management is one of the things that pipelines takes care of. Still, if we did anything, it would abstracting the networking so that pipelines could be dropped in. This is what we did with Kestrel and now we have libuv, sockets and in memory transports (great for testing 😄).

The other thing we did was do enough work to replace all of the things you have to do to manage buffers efficiently, parse various types of data from buffers directly which hopefully means the buffer types end up potentially being very small.

https://github.com/npgsql/npgsql/blob/d2daf635c8692f37834fb1c0c6aad10e87dc4aae/src/Npgsql/NpgsqlReadBuffer.cs

How would you to proceed? If you want to dive into the code please take a look at the perf branch which hasn't yet been merged but has lots of important stuff. If you want to do a call or a chat about it just say when.

Lets try to do an initial investigation in ~2-3 weeks.

@bgrainger

@davidfowl What's the status of TLS for pipelines?

There is nothing that Microsoft is shipping in this space. SSLStream is still our official story. .NET Core 2.1 has some good performance improvements but there's still more work to be done.

@Drawaes is working on a native pipelines implementation of TLS (also a fully managed one) that might be interesting to look at.

https://github.com/Drawaes/Leto

MySqlConnector will need at least support for client SSL certificates to move 100% to pipelines (instead of having one path that uses SslStream for TLS connections and a different path that uses a pipe for non-TLS connections).

Kestrel adapts SSLStream into a pipe. That's trivial to do.

How would you to proceed? If you want to dive into the code please take a look at the perf branch which hasn't yet been merged but has lots of important stuff. If you want to do a call or a chat about it just say when.

Lets try to do an initial investigation in ~2-3 weeks.

Sounds great, ping me when it calms down and we'll take a look together.

Pipelines themselves don't do much. Kestrel's transport layer is something that would be potentially useful (even though it's tied to the server, it has all of the pieces required to make a client API work). Buffer management is one of the things that pipelines takes care of. Still, if we did anything, it would abstracting the networking so that pipelines could be dropped in. This is what we did with Kestrel and now we have libuv, sockets and in memory transports (great for testing smile).

The other thing we did was do enough work to replace all of the things you have to do to manage buffers efficiently, parse various types of data from buffers directly which hopefully means the buffer types end up potentially being very small.

I suspect that there are some difference between a database client model and a webserver model when it comes to buffering, here are a few notes about how it works in Npgsql:

Database (physical) connections are considered very heavyweight and long-term objects (connection pooling makes sure they're reused).
Each connection has its own read and write buffer (I need both because in some cases reading and writing need to happen in parallel). So buffers are always tied to their connection, they never move around between connections and we don't need any sort of buffer pool (as far as I can see).
On the reading side, one relatively unique/nice property of ADO.NET is that it more or less mandates you to buffer rows by default - unless CommandBehavior.Sequential is specified you're supposed to be able to do random access to columns in a row (which pretty much means you need to have all of them in memory).
So the optimal way to work is to have your read buffer at least the size of rows you'll be reading. The default buffer size is 8k, if users expect to read 10k-large rows they should tune the buffer to that via a connection string parameter. In general we consider large rows to be a relatively rare scenario that's not necessarily worth spending too much time on.
If you turn on sequential mode, we no longer buffer the entire row - we read 8k at a time, and it's up to the user to never read an earlier column.
If a row is being read that's bigger than the buffer, and sequential mode isn't specified, Npgsql internally allocates an oversize buffer to hold the row. This buffer lives until the connection is returned to the pool, and this is definitely the "bad performance" scenario - users are encouraged to either increase their buffer size (as described above) or to switch to sequential reading. We could optimize the oversize case, but up to now telling users to tweak buffer size or switching to sequential has seemed sufficient.
On the writing side, there's a write buffer (also 8k by default), data is written to it and it's flushed when it fills up (no oversize case). Because the protocol requires the length up-front, I currently do two passes over outgoing data - once to calculate the length (and validate) and another to actually write. I have a change planned for vnext that would do single-pass writing if the data happens to fit in the buffer (which should always be the case).

That's about it... It's a very simple design. The buffers are currently byte[], but we can definitely switch them to Memory<byte> when that's out. Actual type reading (i.e. getting an int out of the byte[]) happens with Unsafe.ReadUnaligned<T>() (thanks to @YohDeadfall for this recent optimization), Memory<byte> should probably provide a somewhat cleaner way of doing this efficiently. Aside from that there's little slicing going on so I'm not sure Memory/Span will help much.

I'm not sure to what extent the above is the same with MySQL but I'm guessing it's not going to be super different.

MySqlConnector has a very similar model (with a few small differences, e.g., a per-connection read/write buffer, currently no user-exposed way to tune the buffer size, etc.).

There's currently a lot of inefficiency in reading data that I'm planning to fix with Utf8Parser: mysql-net/MySqlConnector#426

aspnet / DataAccessPerformance

Measure impact of allocation free async #32