Open roji opened 6 years ago
System.Data Triage: We should consider this, but since it isn’t for 2.1, moving to future.
Are any of these methods streaming? Would IAsyncEnumerable be interesting here?
@davidfowl DbDataReader.Read
is streaming. The current API does not resemble IEnumerable
(possibly because it predates it :smile:), but the new API could use IAsyncEnumerable
if the timing is good.
See this was mentioned on https://github.com/aspnet/DataAccessPerformance/issues/32.
DbDataReader implement IEnumerable, exposing rows as DbDataRecord. It's an old API (non-generic, sync-only), and the default implementation, which uses DbEnumerator, is also pretty inefficient: it calls DbDataReader.GetValues()
, which copies the row's values into an object[] (which gets allocated on each MoveNext()...).
But I actually don't really see a reason not to also have DbDataReader implement IAsyncEnumerable... It could provide a modern and efficient way to stream over the the rows of a resultset. Opened dotnet/corefx#27684 to track.
Is the allocation overhead really significant when opening a connection or executing a reader? Executing select null
against SQL Server on the same machine over shared memory and using synchronous IO takes about 200us. This already is so much more than allocating a few objects and collecting them.
Adding ValueTask versions for reading rows and fields might be valuable.
You're right that when OpenAsync()
returns asynchronously (i.e. a physical open or wait), the memory overhead is probably negligible. For synchronous invocations (no I/O, pooled idle connection is returned) there is no memory overhead, since OpenAsync()
returns a non-generic task which can be cached. So I agree this has more value for DbDataReadet.Read()
.
However, don't forget that ADO.NET is sometimes used to access databases like Sqlite, where the physical open can be much much faster (no networking), so it could be significant there.
ValueTask
If we're going to touch this, we might as well fix the IsDBNull problem at the same time. Currently we need to make two calls, once to check for nulls and again to actually read the value. When I profile my ORM, IsDBNull is often flagged as most expensive operation in aggregate. (EDIT: this is no longer true.)
We can probably make it a lot more convenient efficient with methods such as
ValueTask<T?> DblDataReader.GetFieldValueOrNullAsync
Depending on the ADO.NET provider implementation, calling IsDBNull()
separately from GetFieldValue<T>()
isn't necessary a big perf problem (I'm not sure it's the case with Npgsql). But introducing something like GetFieldValueOrNull()
(and its async counterpart) would be a good idea regardless, if only for improving the API/usability.
Note that using the async versions of these methods is probably a bad idea unless CommandBehavior.SequentialAccess
is specified, since the entire row is pretty much guaranteed to be buffered in memory anyway...
You're right. When I last benchmarked it on .NET Framework a year or two ago it was a major issue. Now it's barely visible in the performance profiler.
Still, it would be nice for those who are using ADO.NET directly to not have to write those checks.
Absolutely, it would be great if ADO.NET got a bit of a face-lift in terms of API usability, regardless of any perf aspects...
Good to know that perf-wise things are better, there have been a lot of changes in recent two years.
If this proposed deserialization API is implemented then the ADO.NET APIs will not be very chatty. There will be less of a need for ValueTask
. ValueTask
can be used internally.
@roji @Grauenwolf I would like us to track the idea of GetFieldValueOrNull[Async] as a separate issue. Would you like to create it?
I haven't read the underlying code yet, so I think someone else would be better suited to proposing an API.
I vote against mixing ValueTask
and Task
return types arbitrarily based solely on the age of the method. ValueTask
return types should only be highly targeted to methods likely to be called over and over.
I do agree BeginTransactionAsync
and CommitTransactionAsync
are needed, though they should not be ValueTask
.
@Grauenwolf stating the problem with the current API would be enough. But no worries.
I'm ok with BeginTransactionAsync and CommitTransactionAsync not being ValueTask.
Really it is only DBDataReader that I care regarding ValueTask because it is in a tight loop.
@roji I wonder how this would look like now that we have a better understanding of the guidance re ValueTask\
It seems to me that only the API on DbDataReader is worth considering.
@divega I agree - for the execution APIs which never return synchronously, it doesn't make much sense to think about ValueTask. I will update the description soon to reflect this.
There's also dotnet/corefx#35012 (adding missing async methods, as opposed to ValueTask counterparts to existing async methods). It seems that proposal already follows the guidance - the only ValueTask-returning methods are DisposeAsync()
(so coming from IAsyncDisposable) and CloseAsync()
, which is almost identical to DisposeAsync()
.
@roji perhaps we should get everything we want from this issue copied into dotnet/corefx#35012 and bring that one to API review. I am saying this because the scope of this issue (#27682) is probably now small enough that it wouldn't affect the chances of dotnet/corefx#35012. I also think it helps have a more holistic view during the API review.
Though there is still the naming problem :smile:
Yeah, and for GetFieldValue there's also (in theory) still the question of better nullability handling (even though right now we don't seem to have anything better). That's why it may be good to handle them differently - I think dotnet/corefx#35012 can be brought to design review right away.
I added api-ready-for-review on dotnet/corefx#35012. Let's at least discuss the ValueTask\
@divega a small note after the recent discussions: ValueTask\
We could defer adding those APIs until an easier alternative is introduced (as is planned for some point) - since until then the ValueTask\
However, in any case we still have GetFieldValueAsync() which would benefit from an overload returning a ValueTask\
Yes @roji, however DbDataReader.ReadAsync is exactly like IAsyncEnumerable\
I am hoping that in the API review meeting we can get the right people to ask about this.
@divega I'll try to investigate a bit how that works. But you're that in any case, even if today it's difficult to benefit from the async completion optimization, it may become easier in the future, and this specific API is definitely a place where that may make sense. As you say, raising this in the API review meeting would probably get us the answers we need.
Regardless, I'll soon update the description to reflect this discussion.
An argument can be made for providing the highest possible performance alternative (likely ValueTask
): Who still uses raw ADO.NET these days? That is rare. Normally, people use an ORM or a thin wrapper such as Dapper. Inconvenient APIs are hidden from most callers.
@GSPP Tons and tons of people still use ADO.NET. There are huge classes of developers that refuse to use anything that is not shipped with .NET. Not to mention all the legacy code out there.
@GSPP can you explain how your argument that users don't usually use ADO.NET directly relates to this issue? At least at this point it isn't really recommended to add APIs everywhere that return non-generic ValueTask, as it is very unlikely that this would help perf in any way (it would actually degrade perf in some cases compared to returning non-generic Task).
There's also a direction of thought about making ADO.NET more usable directly (as opposed to usable only via other layers).
Even those who use ORMs still need to drop down into ADO.NET for legacy databases, especially if they have stored procedures.
I was thinking about this recently and had an idea. There are three main scenarios.
IsDBNull[Async]
Get*
and GetFieldValue[Async]
IsDBNull
followed immediately by GetFieldValue
. This is the interesting case.Even when what you want is buffered locally you're calling two Task<T>
returning methods and unless T is bool you're really going to have to allocate most of the time. Even if we could change this to ValueTask<T>
it would still be two calls to ask about the same field. The usual sync pattern for this sort of thing would be bool TryGetNotNullFieldValue<T>((string fieldname, out T value)
which if ok but that out parameter prevents it being converted to async. But! we have ValueTuple
s now so we don't have to use the awkward pattern of having additional return parameters as out/ref. We can write ValueTask(bool isNotNull, T value)> GetNotNullFieldValue(Async<T>(string fieldname)
which is neatly async valuetasked and asks both relevant questions in a single call.
You'd end up with callsites like this:
private static async Task Performance(string connectionStirng)
{
using (var connection = new SqlConnection(connectionStirng))
using (var command = new SqlCommand("select id from table",connection))
using (var reader = await command.ExecuteReaderAsync())
{
while (await reader.ReadAsync())
{
var valueTask = reader.GetNotNullFieldValueAsync<int>("id");
var result = valueTask.IsCompletedSuccessfully ? valueTask.Result : await valueTask.ConfigureAwait(false);
if (result.isNotNull)
{
// use the value
}
else
{
// er...
}
}
}
}
And if you didn't care about the valuetask pattern you can just directly await the method and skip the task variable entirely. This way T should be annotatable with the nullability attributes so you don't have to escape when accessing result.Value
. If you really want to go all-out it could return a struct type with bool operator overload so you can just do if (result) { // use result.Value
Of course the function name sucks but that's because all the really good names are already taken, it'd have to be something silly and overlong but we can pass that one up to the design review meeting to handle, they love spending hours trying to decide the best names for things 😁
The ValueTask(bool isNotNull, T value)>
idea is promising. It's also a functional API now. out
is side-effecting and it does not compose as well as an expression. Making two calls (IsNull...) is much less nice than one call that returns exactly what you want.
Essentially, ADO.NET wants an option type. This is what a SQL NULL
is. It's almost exactly what Nullable
provides except that Nullable
is restricted to value types.
Probably, this should be a custom struct with deconstruction support. ValueTask
is not supposed to be in public APIs at this point (I agree with that. A custom struct is cleaner given the very high cleanliness requirements of the framework.).
As a side note, I see this (IMHO) design mistake a lot in the framework where people really wanted an option type but instead chose bool TryGetX(out value)
or some other rather awkward pattern (such as two methods calls).
Maybe .NET would benefit from an option type that is not restricted to value types. But since we don't have that we can give ADO.NET it's own fundamental option type.
ValueTask is not supposed to be in public APIs at this point
How do you mean? ValueTask is exposed in many public APIs (e.g. IAsyncDisposable.DisposeAsync, Stream.ReadAsync...).
There was resistance to using value tuples in public apis due to not having agreed naming guidance for them. I believe this was resolved by the design meetings a few months ago so it might be possible. I'm fine with using a struct instead of a value tuple.
Ah I see - I think there was some confusion between ValueTuple and ValueTask above.
ValueTask is not supported by Visual Basic. Until that is addressed we should be hesitant about using it in public APIs.
ValueTask is not supported by Visual Basic.
How is it not supported?
The docs just say no without going into details.
https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.valuetask-1?view=netstandard-2.1
ValueTask is not supposed to be in public APIs at this point
=> I meant value tuple.
Regardless of whether it is allowed or not, I think it's wise most of the time to use a custom type. (This stance is for library code only, like .NET).
ValueTask is not supported by Visual Basic.
How is it not supported?
The docs just say no without going into details. https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.valuetask-1?view=netstandard-2.1
@KathleenDollard, @jaredpar, what about ValueTask
/ValueTask<T>
isn't supported in Visual Basic?
Visual Basic cannot define an async method which produces a ValueTask / ValueTask<T>
but it can consume a ValueTask / ValueTask<T>
using the Await
operator.
VB didn’t get Task likes?
VB didn’t get Task likes?
Correct. It is still limited to defining async methods that return Task / Task<T>
. It can consume ValueTask / ValueTask<T>
, or any other type that implements the awaiter pattern.
Do we need to ensure VB can be used to implement these interfaces/abstract classes?
Jonathan Allen 619-933-8527
From: Jared Parsons notifications@github.com
Sent: Wednesday, September 4, 2019 1:36:26 PM
To: dotnet/corefx corefx@noreply.github.com
Cc: Jonathan Allen grauenwolf@gmail.com; Mention mention@noreply.github.com
Subject: Re: [dotnet/corefx] Add ValueTask
VB didn’t get Task likes?
Correct. It is still limited to defining async methods that return Task / Task
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/dotnet/corefx/issues/27682?email_source=notifications&email_token=ACRCUTMRMTTOYFUH6N7O7YLQH7WZVA5CNFSM4ETLMPM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD54LRBY#issuecomment-528005255, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACRCUTP65RC2X2J33RZ65WDQH7WZVANCNFSM4ETLMPMQ.
There is nothing preventing VB from implementing a member that has a ValueTask
return type. VB just can't do it using the Async
function style. If an Async
style method was needed then a simple thunking approach can be used to put the logic in an Async
style method and call that in a ValueTask
returning method directly.
Sounds reasonable to me. Not ideal, but how many database drivers are written in VB.
Jonathan Allen 619-933-8527
From: Jared Parsons notifications@github.com
Sent: Wednesday, September 4, 2019 3:03:31 PM
To: dotnet/corefx corefx@noreply.github.com
Cc: Jonathan Allen grauenwolf@gmail.com; Mention mention@noreply.github.com
Subject: Re: [dotnet/corefx] Add ValueTask
There is nothing preventing VB from implementing a member that has a ValueTask return type. VB just can't do it using the Async function style. If an Async style method was needed then a simple thunking approach can be used to put the logic in an Async style method and call that in a ValueTask returning method directly.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/dotnet/corefx/issues/27682?email_source=notifications&email_token=ACRCUTO7LSFO3PXMBUPD7WDQIABAHA5CNFSM4ETLMPM2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD54T6GY#issuecomment-528039707, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACRCUTO4MCOXNNVCSRTU6XLQIABAHANCNFSM4ETLMPMQ.
With https://github.com/dotnet/corefx/issues/27445 allowing elimination of allocations in asynchronously-executing async methods, we should look into adding ValueTask-returning counterparts to ADO.NET. This would potentially allow zero-allocation database access.
Here are the current async methods:
Notes:
Naming the new methods is going to be complicated. Some parts of corefx are lucky in that they're introducing
ValueTask<T>
overloads along withSpan<T>
(e.g. Stream), but ADO.NET has no such upcoming parameter changes/additions. Since there can't be yet another overload that only differs on the return type, and the "standard" name with just theAsync
suffix is taken, we need to come up with a new naming pattern.There are some missing async methods in ADO.NET in general (e.g.
DbCommand.Prepare()
,DbConnection.BeginTransactionAsync(...)
,DbTransaction.CommitAsync(...)
...). These can be added directly with return valueValueTask<T>
(covered by issue https://github.com/dotnet/corefx/issues/35012). Naming of these new methods should probably be decided in the context of whatever new naming pattern we use for theValueTask<T>
returning variants of the existing async methods.Also note that async APIs that return
Task
orTask<bool>
(such asReadAsync()
andIsDBNullAsync()
) can be implemented to return cachedTask<T>
instances when they complete synchronously to gain efficiency instead of adoptingValueTask<TResult>
. This mitigates the need forValueTask<T>
returning variants for most APIs. Also, in terms of allocations, the advantage ofValueTask<T>
is much greater the more fine grained the async methods are.Compatibility for providers: It seems desirable to offer a default implementation of the Task-based method that calls
AsTask()
on theValueTask<TResult>
instance returned by the new API, so that new providers don't need to implement the oldTask<TResult>
version of the API but instead only implement the more efficientValueTask<TResult>
version. It also seems good not to force existing providers to switch.**Updated by @divega on 1/13/2019 to consolidate with https://github.com/dotnet/corefx/issues/15011.