dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.47k stars 4.76k forks source link

Add ValueTask<T> returning async APIs to ADO.NET #25297

Open roji opened 6 years ago

roji commented 6 years ago

With https://github.com/dotnet/corefx/issues/27445 allowing elimination of allocations in asynchronously-executing async methods, we should look into adding ValueTask-returning counterparts to ADO.NET. This would potentially allow zero-allocation database access.

Here are the current async methods:

Task DbConnection.OpenAsync(...);

Task<DbDataReader> DbCommand.ExecuteReaderAsync(...);
Task<DbDataReader> DbCommand.ExecuteDbDataReaderAsync(...);
Task<object> DbCommand.ExecuteScalarAsync(...);
Task DbCommand.ExecuteNonQueryAsync(...);

Task<bool> DbDataReader.ReadAsync(...);
Task<T> DbDataReader.GetFieldValueAsync<T>(...);
Task<bool> DbDataReader.IsDBNullAsync(...);

Notes:

  1. Naming the new methods is going to be complicated. Some parts of corefx are lucky in that they're introducing ValueTask<T> overloads along with Span<T> (e.g. Stream), but ADO.NET has no such upcoming parameter changes/additions. Since there can't be yet another overload that only differs on the return type, and the "standard" name with just the Async suffix is taken, we need to come up with a new naming pattern.

  2. There are some missing async methods in ADO.NET in general (e.g. DbCommand.Prepare(), DbConnection.BeginTransactionAsync(...), DbTransaction.CommitAsync(...)...). These can be added directly with return value ValueTask<T> (covered by issue https://github.com/dotnet/corefx/issues/35012). Naming of these new methods should probably be decided in the context of whatever new naming pattern we use for the ValueTask<T> returning variants of the existing async methods.

  3. Also note that async APIs that return Task or Task<bool> (such as ReadAsync() and IsDBNullAsync()) can be implemented to return cached Task<T> instances when they complete synchronously to gain efficiency instead of adopting ValueTask<TResult>. This mitigates the need for ValueTask<T> returning variants for most APIs. Also, in terms of allocations, the advantage of ValueTask<T> is much greater the more fine grained the async methods are.

  4. Compatibility for providers: It seems desirable to offer a default implementation of the Task-based method that calls AsTask() on the ValueTask<TResult> instance returned by the new API, so that new providers don't need to implement the old Task<TResult> version of the API but instead only implement the more efficient ValueTask<TResult> version. It also seems good not to force existing providers to switch.

    **Updated by @divega on 1/13/2019 to consolidate with https://github.com/dotnet/corefx/issues/15011.

divega commented 6 years ago

System.Data Triage: We should consider this, but since it isn’t for 2.1, moving to future.

davidfowl commented 6 years ago

Are any of these methods streaming? Would IAsyncEnumerable be interesting here?

divega commented 6 years ago

@davidfowl DbDataReader.Read is streaming. The current API does not resemble IEnumerable (possibly because it predates it :smile:), but the new API could use IAsyncEnumerable if the timing is good.

See this was mentioned on https://github.com/aspnet/DataAccessPerformance/issues/32.

roji commented 6 years ago

DbDataReader implement IEnumerable, exposing rows as DbDataRecord. It's an old API (non-generic, sync-only), and the default implementation, which uses DbEnumerator, is also pretty inefficient: it calls DbDataReader.GetValues(), which copies the row's values into an object[] (which gets allocated on each MoveNext()...).

But I actually don't really see a reason not to also have DbDataReader implement IAsyncEnumerable... It could provide a modern and efficient way to stream over the the rows of a resultset. Opened dotnet/corefx#27684 to track.

GSPP commented 6 years ago

Is the allocation overhead really significant when opening a connection or executing a reader? Executing select null against SQL Server on the same machine over shared memory and using synchronous IO takes about 200us. This already is so much more than allocating a few objects and collecting them.

Adding ValueTask versions for reading rows and fields might be valuable.

roji commented 6 years ago

You're right that when OpenAsync() returns asynchronously (i.e. a physical open or wait), the memory overhead is probably negligible. For synchronous invocations (no I/O, pooled idle connection is returned) there is no memory overhead, since OpenAsync() returns a non-generic task which can be cached. So I agree this has more value for DbDataReadet.Read().

However, don't forget that ADO.NET is sometimes used to access databases like Sqlite, where the physical open can be much much faster (no networking), so it could be significant there.

Grauenwolf commented 6 years ago

ValueTask DblDataReader.GetFieldValueAsync(...); ValueTask DbDataReader.IsDBNullAsync(...);

If we're going to touch this, we might as well fix the IsDBNull problem at the same time. Currently we need to make two calls, once to check for nulls and again to actually read the value. When I profile my ORM, IsDBNull is often flagged as most expensive operation in aggregate. (EDIT: this is no longer true.)

We can probably make it a lot more convenient efficient with methods such as

ValueTask<T?> DblDataReader.GetFieldValueOrNullAsync(...);

roji commented 6 years ago

Depending on the ADO.NET provider implementation, calling IsDBNull() separately from GetFieldValue<T>() isn't necessary a big perf problem (I'm not sure it's the case with Npgsql). But introducing something like GetFieldValueOrNull() (and its async counterpart) would be a good idea regardless, if only for improving the API/usability.

Note that using the async versions of these methods is probably a bad idea unless CommandBehavior.SequentialAccess is specified, since the entire row is pretty much guaranteed to be buffered in memory anyway...

Grauenwolf commented 6 years ago

You're right. When I last benchmarked it on .NET Framework a year or two ago it was a major issue. Now it's barely visible in the performance profiler.

Grauenwolf commented 6 years ago

Still, it would be nice for those who are using ADO.NET directly to not have to write those checks.

roji commented 6 years ago

Absolutely, it would be great if ADO.NET got a bit of a face-lift in terms of API usability, regardless of any perf aspects...

Good to know that perf-wise things are better, there have been a lot of changes in recent two years.

GSPP commented 6 years ago

If this proposed deserialization API is implemented then the ADO.NET APIs will not be very chatty. There will be less of a need for ValueTask. ValueTask can be used internally.

divega commented 6 years ago

@roji @Grauenwolf I would like us to track the idea of GetFieldValueOrNull[Async] as a separate issue. Would you like to create it?

Grauenwolf commented 6 years ago

I haven't read the underlying code yet, so I think someone else would be better suited to proposing an API.

MgSam commented 6 years ago

I vote against mixing ValueTask and Task return types arbitrarily based solely on the age of the method. ValueTask return types should only be highly targeted to methods likely to be called over and over.

I do agree BeginTransactionAsync and CommitTransactionAsync are needed, though they should not be ValueTask.

divega commented 6 years ago

@Grauenwolf stating the problem with the current API would be enough. But no worries.

Grauenwolf commented 6 years ago

I'm ok with BeginTransactionAsync and CommitTransactionAsync not being ValueTask.

Really it is only DBDataReader that I care regarding ValueTask because it is in a tight loop.

divega commented 5 years ago

@roji I wonder how this would look like now that we have a better understanding of the guidance re ValueTask\ vs. Task\ (from https://github.com/aspnet/EntityFrameworkCore/issues/15184).

It seems to me that only the API on DbDataReader is worth considering.

roji commented 5 years ago

@divega I agree - for the execution APIs which never return synchronously, it doesn't make much sense to think about ValueTask. I will update the description soon to reflect this.

There's also dotnet/corefx#35012 (adding missing async methods, as opposed to ValueTask counterparts to existing async methods). It seems that proposal already follows the guidance - the only ValueTask-returning methods are DisposeAsync() (so coming from IAsyncDisposable) and CloseAsync(), which is almost identical to DisposeAsync().

divega commented 5 years ago

@roji perhaps we should get everything we want from this issue copied into dotnet/corefx#35012 and bring that one to API review. I am saying this because the scope of this issue (#27682) is probably now small enough that it wouldn't affect the chances of dotnet/corefx#35012. I also think it helps have a more holistic view during the API review.

divega commented 5 years ago

Though there is still the naming problem :smile:

roji commented 5 years ago

Yeah, and for GetFieldValue there's also (in theory) still the question of better nullability handling (even though right now we don't seem to have anything better). That's why it may be good to handle them differently - I think dotnet/corefx#35012 can be brought to design review right away.

divega commented 5 years ago

I added api-ready-for-review on dotnet/corefx#35012. Let's at least discuss the ValueTask\ version of ReadAsync there.

roji commented 5 years ago

@divega a small note after the recent discussions: ValueTask\ seems to be the same case as ValueTask, in that it provides no value for synchronous completion: Task is already cached (only two possible values). It may still make sense to add ValueTask\ overloads in case a provider ever decides to optimize the asynchronous completion case (via IValueTaskSource) - although at this point that seems pretty unlikely, given the effort required (see @stephentoub's comment quoted in https://github.com/aspnet/EntityFrameworkCore/issues/15184#issuecomment-478484271).

We could defer adding those APIs until an easier alternative is introduced (as is planned for some point) - since until then the ValueTask\ version will actually be a bit slower than the Task one (due to the overhead of the added layer etc.). What do you think?

However, in any case we still have GetFieldValueAsync() which would benefit from an overload returning a ValueTask\.

divega commented 5 years ago

Yes @roji, however DbDataReader.ReadAsync is exactly like IAsyncEnumerable\.MoveNextAsync, which is already optimized for the async completion case. I suspect someone willing to optimize it could reuse ManualResetValueTaskSourceCore\ and duplicate the logic used to build async iterators.

I am hoping that in the API review meeting we can get the right people to ask about this.

roji commented 5 years ago

@divega I'll try to investigate a bit how that works. But you're that in any case, even if today it's difficult to benefit from the async completion optimization, it may become easier in the future, and this specific API is definitely a place where that may make sense. As you say, raising this in the API review meeting would probably get us the answers we need.

Regardless, I'll soon update the description to reflect this discussion.

GSPP commented 5 years ago

An argument can be made for providing the highest possible performance alternative (likely ValueTask): Who still uses raw ADO.NET these days? That is rare. Normally, people use an ORM or a thin wrapper such as Dapper. Inconvenient APIs are hidden from most callers.

MgSam commented 5 years ago

@GSPP Tons and tons of people still use ADO.NET. There are huge classes of developers that refuse to use anything that is not shipped with .NET. Not to mention all the legacy code out there.

roji commented 5 years ago

@GSPP can you explain how your argument that users don't usually use ADO.NET directly relates to this issue? At least at this point it isn't really recommended to add APIs everywhere that return non-generic ValueTask, as it is very unlikely that this would help perf in any way (it would actually degrade perf in some cases compared to returning non-generic Task).

There's also a direction of thought about making ADO.NET more usable directly (as opposed to usable only via other layers).

Grauenwolf commented 5 years ago

Even those who use ORMs still need to drop down into ADO.NET for legacy databases, especially if they have stored procedures.

Wraith2 commented 5 years ago

I was thinking about this recently and had an idea. There are three main scenarios.

  1. The caller want to know if something is null but then acts directly on that knowledge. I's say this is rare and is covered by IsDBNull[Async]
  2. The caller knows by value of the query definition that the field will not be null and null appearing is an exceptional condition, this is covered by Get* and GetFieldValue[Async]
  3. The caller wants to know if something is null and if it isn't then get that value. At the moment you do IsDBNull followed immediately by GetFieldValue. This is the interesting case.

Even when what you want is buffered locally you're calling two Task<T> returning methods and unless T is bool you're really going to have to allocate most of the time. Even if we could change this to ValueTask<T> it would still be two calls to ask about the same field. The usual sync pattern for this sort of thing would be bool TryGetNotNullFieldValue<T>((string fieldname, out T value) which if ok but that out parameter prevents it being converted to async. But! we have ValueTuples now so we don't have to use the awkward pattern of having additional return parameters as out/ref. We can write ValueTask(bool isNotNull, T value)> GetNotNullFieldValue(Async<T>(string fieldname) which is neatly async valuetasked and asks both relevant questions in a single call.

You'd end up with callsites like this:

private static async Task Performance(string connectionStirng)
{
    using (var connection = new SqlConnection(connectionStirng))
    using (var command = new SqlCommand("select id from table",connection))
    using (var reader = await command.ExecuteReaderAsync())
    {
        while (await reader.ReadAsync())
        {
            var valueTask = reader.GetNotNullFieldValueAsync<int>("id");
            var result = valueTask.IsCompletedSuccessfully ? valueTask.Result : await valueTask.ConfigureAwait(false);
            if (result.isNotNull)
            {
                // use the value
            }
            else
            {
                // er...
            }
        }
    }
}

And if you didn't care about the valuetask pattern you can just directly await the method and skip the task variable entirely. This way T should be annotatable with the nullability attributes so you don't have to escape when accessing result.Value. If you really want to go all-out it could return a struct type with bool operator overload so you can just do if (result) { // use result.Value Of course the function name sucks but that's because all the really good names are already taken, it'd have to be something silly and overlong but we can pass that one up to the design review meeting to handle, they love spending hours trying to decide the best names for things 😁

GSPP commented 5 years ago

The ValueTask(bool isNotNull, T value)> idea is promising. It's also a functional API now. out is side-effecting and it does not compose as well as an expression. Making two calls (IsNull...) is much less nice than one call that returns exactly what you want.

Essentially, ADO.NET wants an option type. This is what a SQL NULL is. It's almost exactly what Nullable provides except that Nullable is restricted to value types.

Probably, this should be a custom struct with deconstruction support. ValueTask is not supposed to be in public APIs at this point (I agree with that. A custom struct is cleaner given the very high cleanliness requirements of the framework.).

As a side note, I see this (IMHO) design mistake a lot in the framework where people really wanted an option type but instead chose bool TryGetX(out value) or some other rather awkward pattern (such as two methods calls).

Maybe .NET would benefit from an option type that is not restricted to value types. But since we don't have that we can give ADO.NET it's own fundamental option type.

roji commented 5 years ago

ValueTask is not supposed to be in public APIs at this point

How do you mean? ValueTask is exposed in many public APIs (e.g. IAsyncDisposable.DisposeAsync, Stream.ReadAsync...).

Wraith2 commented 5 years ago

There was resistance to using value tuples in public apis due to not having agreed naming guidance for them. I believe this was resolved by the design meetings a few months ago so it might be possible. I'm fine with using a struct instead of a value tuple.

roji commented 5 years ago

Ah I see - I think there was some confusion between ValueTuple and ValueTask above.

Grauenwolf commented 5 years ago

ValueTask is not supported by Visual Basic. Until that is addressed we should be hesitant about using it in public APIs.

stephentoub commented 5 years ago

ValueTask is not supported by Visual Basic.

How is it not supported?

Grauenwolf commented 5 years ago

The docs just say no without going into details.

https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.valuetask-1?view=netstandard-2.1

GSPP commented 5 years ago

ValueTask is not supposed to be in public APIs at this point

=> I meant value tuple.

Regardless of whether it is allowed or not, I think it's wise most of the time to use a custom type. (This stance is for library code only, like .NET).

stephentoub commented 5 years ago

ValueTask is not supported by Visual Basic.

How is it not supported?

The docs just say no without going into details. https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.valuetask-1?view=netstandard-2.1

@KathleenDollard, @jaredpar, what about ValueTask/ValueTask<T> isn't supported in Visual Basic?

jaredpar commented 5 years ago

Visual Basic cannot define an async method which produces a ValueTask / ValueTask<T> but it can consume a ValueTask / ValueTask<T> using the Await operator.

davidfowl commented 5 years ago

VB didn’t get Task likes?

jaredpar commented 5 years ago

VB didn’t get Task likes?

Correct. It is still limited to defining async methods that return Task / Task<T>. It can consume ValueTask / ValueTask<T>, or any other type that implements the awaiter pattern.

https://sharplab.io/#v2:EYLgtghgzgLgpgJwDQBsQDdhJiAligHwEkwAHAewRigAIBlAT1jjAFgAoEiq2x5sAHQAVABYI4EACa4AdgHNh0ANZQOABQCuwFLgDGNAMIpotAxxoWam7XpoBBKAxn6AYhucxc5GTQCyAChh7WiFlAEpgmlCoJXNLeLsAdwhcIJg4iwBRGUkaNw8vGQyaYusdfQcnV3ddT28/ACZAyIA1CBQNOGilCIcaNo6u5WKE5NSadPZ47Nz82sKOGcNjKCggA==

Grauenwolf commented 5 years ago

Do we need to ensure VB can be used to implement these interfaces/abstract classes?

Jonathan Allen 619-933-8527


From: Jared Parsons notifications@github.com Sent: Wednesday, September 4, 2019 1:36:26 PM To: dotnet/corefx corefx@noreply.github.com Cc: Jonathan Allen grauenwolf@gmail.com; Mention mention@noreply.github.com Subject: Re: [dotnet/corefx] Add ValueTask returning async APIs to ADO.NET (#27682)

VB didn’t get Task likes?

Correct. It is still limited to defining async methods that return Task / Task. It can consume ValueTask / ValueTask, or any other type that implements the awaiter pattern.

https://sharplab.io/#v2:EYLgtghgzgLgpgJwDQBsQDdhJiAligHwEkwAHAewRigAIBlAT1jjAFgAoEiq2x5sAHQAVABYI4EACa4AdgHNh0ANZQOABQCuwFLgDGNAMIpotAxxoWam7XpoBBKAxn6AYhucxc5GTQCyAChh7WiFlAEpgmlCoJXNLeLsAdwhcIJg4iwBRGUkaNw8vGQyaYusdfQcnV3ddT28/ACZAyIA1CBQNOGilCIcaNo6u5WKE5NSadPZ47Nz82sKOGcNjKCggA==

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/dotnet/corefx/issues/27682?email_source=notifications&email_token=ACRCUTMRMTTOYFUH6N7O7YLQH7WZVA5CNFSM4ETLMPM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD54LRBY#issuecomment-528005255, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACRCUTP65RC2X2J33RZ65WDQH7WZVANCNFSM4ETLMPMQ.

jaredpar commented 5 years ago

There is nothing preventing VB from implementing a member that has a ValueTask return type. VB just can't do it using the Async function style. If an Async style method was needed then a simple thunking approach can be used to put the logic in an Async style method and call that in a ValueTask returning method directly.

Grauenwolf commented 5 years ago

Sounds reasonable to me. Not ideal, but how many database drivers are written in VB.

Jonathan Allen 619-933-8527


From: Jared Parsons notifications@github.com Sent: Wednesday, September 4, 2019 3:03:31 PM To: dotnet/corefx corefx@noreply.github.com Cc: Jonathan Allen grauenwolf@gmail.com; Mention mention@noreply.github.com Subject: Re: [dotnet/corefx] Add ValueTask returning async APIs to ADO.NET (#27682)

There is nothing preventing VB from implementing a member that has a ValueTask return type. VB just can't do it using the Async function style. If an Async style method was needed then a simple thunking approach can be used to put the logic in an Async style method and call that in a ValueTask returning method directly.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/dotnet/corefx/issues/27682?email_source=notifications&email_token=ACRCUTO7LSFO3PXMBUPD7WDQIABAHA5CNFSM4ETLMPM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD54T6GY#issuecomment-528039707, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACRCUTO4MCOXNNVCSRTU6XLQIABAHANCNFSM4ETLMPMQ.