dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.54k stars 4.54k forks source link

[API Proposal]: Support > int.MaxValue records affected in ADO.NET #90151

Open madelson opened 10 months ago

madelson commented 10 months ago

Background and motivation

Today, ADO.NET provides 2 portable ways of counting the number of records affected by a SQL query or a batch with multiple SQL queries:

DbCommand.ExecuteNonQuery(); // returns int
DbDataReader.RecordsAffected; // int

Because both of these are int, large bulk operations can overflow.

API Proposal

namespace System.Data.Common;

public class DbDataReader
{
    public virtual long RecordsAffected64 => throw new NotSupportedException(nameof(RecordsAffected64));
    ...    
}

API Usage

using DbCommand command = ...
await using DbDataReader reader = command.ExecuteReaderAsync();
while (reader.NextResult()) { }
Console.WriteLine(reader.RecordsAffected64);

Alternative Designs

We could add an overload of ExecuteNonQuery() that returns long instead of in or addition to the DbDataReader property, but this feels clunky given that the use-case is fairly narrow.

The default behavior is up for discussion as well.

Risks

It's possible that some or even most providers have no way of implementing this capability under the hood. For example, looking briefly at the implementation of SqlDataReader.RecordsAffected it isn't clear to me whether the underlying technology supports 64-bit counting or not.

ghost commented 10 months ago

Tagging subscribers to this area: @roji, @ajcvickers See info in area-owners.md if you want to be subscribed.

Issue Details
### Background and motivation Today, ADO.NET provides 2 portable ways of counting the number of records affected by a SQL query or a batch with multiple SQL queries: ``` DbCommand.ExecuteNonQuery(); // returns int DbDataReader.RecordsAffected; // int ``` Because both of these are `int`, large bulk operations can overflow. ### API Proposal ```csharp namespace System.Data.Common; public class DbDataReader { public virtual long RecordsAffected64 => throw new NotSupportedException(nameof(RecordsAffected64)); ... } ``` ### API Usage ```csharp using DbCommand command = ... await using DbDataReader reader = command.ExecuteReaderAsync(); while (reader.NextResult()) { } Console.WriteLine(reader.RecordsAffected64); ``` ### Alternative Designs We could add an overload of `ExecuteNonQuery()` that returns `long` instead of in or addition to the `DbDataReader` property, but this feels clunky given that the use-case is fairly narrow. The default behavior is up for discussion as well. * I do not think it should default to just returning `RecordsAffected`, because this would lead to incorrect results in the case of overflow. * It could return `long?` and default to returning `null`, where `null` indicates that the driver doesn't support computing the value * We could have a property somewhere to check whether this is supported (similar to `DbProviderFactory.CanCreateBatch`). ### Risks It's possible that some or even most providers have no way of implementing this capability under the hood. For example, looking briefly at the implementation of `SqlDataReader.RecordsAffected` it isn't clear to me whether the underlying technology supports 64-bit counting or not.
Author: madelson
Assignees: -
Labels: `api-suggestion`, `area-System.Data`
Milestone: -
roji commented 10 months ago

FWIW Npgsql exposes a ulong Rows property for this reason (which probably means the proposed ADO.NET property should be ulong rather than long).

I do not think it should default to just returning RecordsAffected, because this would lead to incorrect results in the case of overflow.

Hmm... It could be argued that this would be no worse than the current situation, where the int RecordsAffected is the only option anyway, and which may overflow (or may not, depending on the driver implementation - it may throw instead). In other words, either a driver supports the new property, overriding it to return the correct value, or it doesn't, in which case RecordsAffected already (possibly) overflows, so I don't think there's a reason to avoid the new property overflowing as well. This would obviate making the property nullable or introducing yet another capability property just to say whether the property is "well"-implemented.

In any case, I think doing something here does make sense, putting in the Future milestone to gather more feedback for now.

Wraith2 commented 10 months ago

A -1 value can be used to signal that it isn't possible to know the row count. If you're streaming rows as they are found (firehose mode) then you only know the count once you're seen them all.

In general I'd say adding a ulong Rows property or something similar would make sense. We've extended the bulk copy row count to add a 64 bit counter in SqlClient so there's precedent and clear need.

madelson commented 10 months ago

@Wraith2 in my case I can’t count the rows as I go since this is an insert or update query; I’m not bringing rows back to the client.

recordsAffected gives the number of rows changed in that case. Does that make sense?

Wraith2 commented 10 months ago

Yes. I wasn't suggesting that you should count the rows i was just identifying that there has been a reason in the past that a long rather than ulong might be chosen. I think that it's so rare that `ulong is probably the better choice here.

roji commented 10 months ago

@Wraith2 you raise a good point with -1; IIRC that value is returned not just when the number of rows affected isn't yet known, but also when the statement is a SELECT (i.e. doesn't modify rows by definition. From the DbDataReader.RecordsAffected docs:

The number of rows changed, inserted, or deleted. -1 for SELECT statements; 0 if no rows were affected or the statement failed.

So that could be a good reason to make the new property ulong?, with null fulfilling the same function as -1 for the existing 32-bit RecordsAffected.

madelson commented 10 months ago

Agreed. -1 is also returned for things like CREATE TABLE and WAITFOR I believe.

I like the idea of using ulong? with null indicating "n/a" vs. -1. I think it would be somewhat confusing if RecordsAffected returns -1 for SELECT and RecordsAffected64 returns 0 (which presumably is what would need to happen if we used ulong).

roji commented 10 months ago

and RecordsAffected64 returns 0

Not sure that's possible - it wouldn't allow making the distinction between a non-updating statement (SELECT) and an updating statement (UPDATE/DELETE) that happened to not touch any rows because of the WHERE clause.

madelson commented 10 months ago

@roji not sure I follow your comment.

If the data type is ulong (non nullable), then I don’t see how we’d differentiate between 0 and n/a. That seems undesirable.

If we use long or ulong? Instead, then we can differentiate, which seems preferable.

Do you agree?

roji commented 10 months ago

Yeah, we're basically saying the same thing. long isn't a good option since we have at least one database where what's returned is an unsigned 64-bit type (even though long really "should be enough"). I think that leaves only ulong?.

bgrainger commented 10 months ago

we have at least one database where what's returned is an unsigned 64-bit type

At least two: MySQL's OK packet transmits affected_rows using a int<lenenc> type which supports unsigned integer values from 0 to 2⁶⁴-1.