API for writing parameters without boxing

roji commented 8 years ago

In the current ADO.NET API, writing a parameter to the database involves passing it through an object. This implies a boxing operation, which can create lots of garbage in a scenario where lots of value types (e.g. ints) are written to the database.

A generic subclass of DbParameter could solve this, if properly implemented by providers.

GSPP commented 8 years ago

Isn't the boxing overhead next to nothing compared to the fixed cost of making a SQL call? I cannot imagine this being an issue even for 1000 parameters.

roji commented 8 years ago

@GSPP I'm definitely not talking about the overhead of allocating the memory and copying the value - i.e. the cost of the boxing operation itself. The problem is that boxing allocates an object on the heap, producing potentially large amounts of garbage. This garbage creates pressure on the GC, which can be a problem for some applications. Basically it's a different kind of overhead compared to making an SQL call.

karelz commented 7 years ago

@roji (or anyone) can you provide more details what is your plan here?

roji commented 7 years ago

On the read side of things, there's DbDataReader.GetFieldValue<T>() allowing users to generically read values. This allows ADO.NET providers to provide an implementation that doesn't box value types - users can read ints without needless heap allocations.

Unfortunately nothing like this exists on the write side - DbParameter has a object Value property, so writing ints via ADO.NET necessarily implies boxing. This could be resolved by having a generic SqlParameter<T>, whose Value would be of type T. This class could extend could inherit the non-generic SqlParameter for backwards compatibility. It would probably be a good idea to have an IDbParameter<T> interface which would be implemented by the provider-specific generic parameter classes (SqlParameter<T>, NpgsqlParameter<T>).

It would also be necessary to add CreateParameter<T>() to DbProviderFactory to allow portable creation of these new parameters.

Let me know if this makes sense or if you'd like more info.

karelz commented 7 years ago

@saurabh500 @YoungGah is it sufficient info for you? Or do you need more details? If we have enough of direction and you agree with it, please remove the "needs more info" and add "up for grabs" label.

danmoseley commented 7 years ago

@saurabh500 @YoungGah thoughts?

roji commented 7 years ago

Can anybody take a look at this? It would be good to know if you guys see this somewhere on your roadmap etc.

karelz commented 7 years ago

@saurabh500 @divega @corivera any opinion here? Can we at least set expectations / timeline when we will have time to look at it? Thanks!

divega commented 7 years ago

@karelz @danmosemsft @saurabh500 @corivera I think we should remove the "needs-more-info" label and add "up-for-grabs". This sounds like a good idea to at least explore.

@roji it would be great if you could do some prototyping of this in Npgsql if you haven't already. I suspect it should be possible to do enough to asses the API and make some measurements of the impact without making the actual changes on System.Data.Common. If the change turns very positive results then we can take the next step.

I am not sure about the IDbParameter<T> interface. The extensibility model of ADO.NET has been consistently based on class inheritance since .NET Framework 3.0. The existing interfaces are there only for compatibility so adding new interfaces would be strange. Unless there is really good reason I would try to stick to classes.

cc @ajcvickers

karelz commented 7 years ago

@divega sounds good to me - feel free to make such labels changes yourself, as area expert/owner :)

When you mark things "up for grabs", just please try to describe what is needed (next steps) & rough complexity / time investment - see triage rules for details. Thanks!

divega commented 7 years ago

Marking as "up for grabs". In order to make progress on this issue we need to do some exploration to understand both the magnitude of the performance impact (e.g. how many allocations we can actually avoid and how that benefits performance) and how to best extend the API. See https://github.com/dotnet/corefx/issues/8955#issuecomment-260108275 and https://github.com/dotnet/corefx/issues/8955#issuecomment-313905218.

roji commented 7 years ago

FYI I'm working on implementing this within Npgsql, I'll be coming back with some info pretty soon.

roji commented 7 years ago

OK, I've done this in Npgsql (https://github.com/npgsql/npgsql/issues/1639). The question is now how to best add this to ADO.NET as a whole, to allow this to be used in a database-independent way.

General Benefits

Offers a strongly-typed, generic API (TypedValue) alongside the existing weakly-typed object-based API. Promotes type safety, is a more modern API, etc.
Avoids needless boxing when writing value types. Writing a parameter value to the database in Npgsql in now a zero-allocation operation (and in 3.3 ExecuteNonQuery as a whole will probably be zero allocation).

Adding to ADO.NET

This would consist of adding either a new DbParameter<T> abstract base class , inheriting from DbParameter, or an IDbParameter<T> (see discussion below).

In addition, DbProviderFactory would need to be fitted with a new GetParameter<T>() alongside the existing GetParameter() (a GetParameter<T>(PermissionState) may also be necessary). The default implementation of this method would return a shim wrapping the result of the provider's GetParameter(); this would allow providers not providing a real generic parameter implementation to continue working seamlessly.

Base classes vs. interfaces

As @divega mentioned above, ADO.NET APIs are based on base classes rather than interfaces. I worked in both directions for a while to explore what the API would look like, here are some points:

Via base class (`DbParameter<T>`)

Does not allow for easy code sharing between NpgsqlParameter<T> and NpgsqlParameter - (substantial) logic has to be either duplicated or refactored out somehow.
An internal interface must be introduced to capture the common API between NpgsqlParameter and NpgsqlParameter<T>; this is necessary to allow internal provider code to continue working. This adds considerable clutter to the codebase and is cumbersome.
If any user-facing APIs exist which accept/return an NpgsqlParameter, these also have to be changed (or counterparts added) to accept/return an interface capturing the two parameter types. This interface would need to be distinct from the internal one from the previous point, to avoid exposing internal functionality (so we end up with INpgsqlParameter and INpgsqlInternalParameter).

Via interface (`IDbParameter<T>`)

Allows the NpgsqlParameter<T> to inherit from NpgsqlParameter. Since both are parameter classes, there's likely to be a lot of shared code; if NpgsqlParameter<T> inherits NpgsqlParameter we only need to add the typed value property plus some minimal generic-specific handling.
In order for IDbParameter<T> to be useful, it must duplicate the API surface of DbParameter, otherwise the user can't manipulate things like Size, Precision.

Conclusions

For provider codebase maintenance and sanity, I'd really prefer it if NpgsqlParameter<T> could extend from NpgsqlParameter. However, the fact that IDbParameter<T> needs to duplicate the DbParameter API is problematic: it would make it impossible to add a method with a default implementation to DbParameter.

On the other hand, I hear that C# 8 will have default interface methods, so maybe it's not so bad :)

ajcvickers commented 7 years ago

@roji Good stuff! /cc @anpete

Wraith2 commented 5 years ago

I saw this mentioned in the issue review earlier and thought it might be useful to provide some feedback since it isn't dead.

I did some investigation on doing this with SqlClient because I wanted to try and remove the parameter box. It's messy and I'm not sure how you would achieve it without having the parameter instance write the data into the tds buffer..

At the moment the parameter is asked for it's value object (including any coercion) and that object is then written by the TdsParserStateObject which understands the layout of all the relevant types. Doing this without the object variable means you've either got to have the correct storage location generated at runtime by non-generic code or you delegate the writing of bytes to the generic object which can avoid the variable. Asking the parameter to do it exposes the internals of the tds layer to the user layer or will force multiple allocation and copying between buffers, neither of which is a great idea.

It's worth doing but it seems difficult to implement practically at the moment. I also think it will flow new members into System.Data which may cause compatibility problems without care.

roji commented 5 years ago

@Wraith2 you're right that this isn't necessarily a trivial thing to do inside an ADO.NET provider, and can mean serious refactoring to actually avoid refactoring. The main idea here is to at least allow providers to do this API-wise - if they do it or not is a different question. The default implementation for this new generic API would in any case delegate to the existing non-generic API, in order to prevent breaking changes, so existing providers would simply continue to work.

I don't know anything about the SqlClient internal implementation... Full generic parameter handling indeed means that writing has to be generic "all the way down", without passing through a non-generic layer (such as TdsParserStateObject?) which switches on the specific type etc. Definitely not trivial.

Wraith2 commented 5 years ago

Full generic parameter handling indeed means that writing has to be generi

Possibly, at the very least it means that the thing writing the value had to be generic though the caller may not need to be aware of the exact type. It's well worth doing but it will be a big very complicated job.

I don't know anything about the SqlClient internal implementation..

It's quite fun in there. Lots of history to learn 😁

roji commented 5 years ago

Note: one non-trivial issue to be resolved is how nulls are written - with the current non-generic DbParameter, nulls are represented via DBNull, but that's not possible with a generic DbParameter<T>.

Unlike non-generic DbParameter, we could accept null values, but that would only work for reference types. We could introduce another property on DbParameter<T> to express this, and also have a DbParameter<T>.Null as a shortcut (although it should still be possible to mutate an existing parameter instance to make it express null).

Wraith2 commented 5 years ago

[edit] stuff to do with readers which wasn't relevant to parameters (https://github.com/dotnet/corefx/issues/27682#issuecomment-526928918).

roji commented 5 years ago

@Wraith2 these is a very important thing that I really hope we get to improve for 5.0, but you're discussing nullability when reading values from a reader, whereas this issue is about the introduction of a generic parameter API which would avoid boxing when sending values to the database.

The issue of GetFieldValue nullability has been discussed in https://github.com/dotnet/corefx/issues/27682#issuecomment-436736787 - for now that issue seems like a good place to continue that conversation. Do you mind moving this comment over there?

Wraith2 commented 5 years ago

You're right. They're sort of linked in my thinking since they're about carrying two pieces for information, is it null and what is the value which is easy as a tuple on output but better modelled as two properties on input. Much better to separate the information out to IsNullable, IsDBNull and Value properties on DbParameter<T> which is what you were talking about and prompted my recollection.

roji commented 5 years ago

They're sort of linked in my thinking since they're about carrying two pieces for information, is it null and what is the value which is easy as a tuple on output but better modelled as two properties on input.

They definitely are, and I can imagine a world where the same solution holds for both (i.e. https://github.com/dotnet/csharplang/issues/2194), but for now that doesn't seem like it would happen...

PS have edited your comment above to link to the other issue.

Wraith2 commented 5 years ago

I'm not sure whether equating language null to DBNull is correct. I can't find a scenario where you need the distinction but the my instinct is to keep them separate.

There's also the possibility of using a separate type, so having DbParameter<T> which is explicitly not nullable and then DbNullableParameter<T> : MDbParameter<T>, INullable which allows them.

roji commented 5 years ago

I'm not sure whether equating language null to DBNull is correct. I can't find a scenario where you need the distinction but the my instinct is to keep them separate.

I'm not sure I see why... C# null maps to other nulls in other contexts (e.g. JSON serialization?), and apart from the problematic intersection of generics, value types and null, I think it would work quite well. If you have any concrete argument here I'd love to hear it.

Somewhat ironically, with the current non-generic DbParameter, mapping language null to database null works even better - since the parameter simply holds a object, it's always possible to simply set it to null. I'd be curious to learn why historically DBNull was chosen to express null instead of language null; it's possibly useful in that it distinguishes an uninitialized parameter (which contains language null) from a parameter set to null (DBNull), but I'm not sure of the actual importance of that (again, the C# language doesn't have that distinction and things seem to work out fine).

There's also the possibility of using a separate type, so having DbParameter which is explicitly not nullable and then DbNullableParameter : MDbParameter, INullable which allows them.

Wouldn't you have to solve the same question with DbNullableParameter, i.e. how to represent null when T is a value type? Having separate nullable and non-nullable parameter types would then be orthogonal to that question. In any case, is there any reason we need a non-nullable parameter type and a nullable parameter type? ADO.NET currently has a single nullable parameter type (DbParameter, where null is expressed via DBNull) and it seems to be working well.

PS DbParameter actually has an IsNullable property, which may resemble your distinction. I'm not sure what it's actually used for though, it possibly only plays a role when using the command builder.

Wraith2 commented 5 years ago

Wouldn't you have to solve the same question with DbNullableParameter, i.e. how to represent null when T is a value type?

Not really no. If IsDBNull the value is undefined so in storage terms it'd be default(T) or the last value, doesn't matter since attempting to read it would be an error.

For some reason I see JSON null as a language null and think that the javascript null and c# null are compatible. I feel that DBNull is a data null which is distinct from a language null. As you pointed out you can use the difference to explicitly see the difference between not setting a value and setting it to be null. Not exactly convincing I agree but it comes from my quite direct dealings with data, I don't use orms.

GSPP commented 5 years ago

I have found a case where DBNull provides a distinction of cases: https://stackoverflow.com/questions/4488727/what-is-the-point-of-dbnull/4488758

I consider DBNull to be an API design mistake. This ExecuteScalar situation could have been solved differently, for example by returning an object (or a struct) describing the result, or by throwing if no result set was returned. DBNull makes handling ADO.NET considerably harder. If its role can be reduced without hurting consistency too much then that's great.

I have never seen a situation where the existence of DBNull was desirable.

roji commented 5 years ago

@GSPP thanks for linking to that question. FWIW I agree with you and Marc's response, and also think that DBNull was a mistake. But it actually doesn't matter that much, since once we move to generic type parameters DBNull simply becomes impossible anyway, so a different way to represent null on parameters must be found in any case. The same may be true also of a new API which would be an alternative to DbDataReader.GetFieldValue, if we choose to go down that path.

Wraith2 commented 5 years ago

That is a useful SO answer. We shouldn't add new uses of DBNull, i think we all agree on that.

public class DbParameter<T>
{
    public bool IsAssigned { get;set; }
    public bool IsNullable { get;set; }
    public bool IsNull { get; set { if (value && !IsNullable) throw new InvalidOperationException() } )
    public T Value { 
        get { if (IsNull) throw new InvalidOperationsException() }  
        set {
            IsAssigned=true;
            if (typeof(T).IsClass && value is null) // the only point of contention?
            {
                IsNull=true;
            }
            _value = value;
        } 
    }
}

roji commented 5 years ago

(just to set expectations - I personally am not going to start working on this right away, although I definitely intend to do this for 5.0)

mldisibio commented 4 years ago

Regarding the discussion around the purpose of DbNull, see also this SO example

As someone who has written an abstraction layer over ADO.Net providers (and now looking to extend it to Postgres via the npgsql library) DbNull is not only important for interpreting what you get back from a result, but what you send in as parameter values. I am familiar with how this works in particular with Sql Server, and was researching how it might apply to an abstraction over NpgsqlParameter<T> (hence I was led to this thread ).

If you want to tell Sql Server to set a column to an explicit null, you use DbNull.Value for the parameter value. But if you happen to use the syntax in the example where you are referencing one or more columns via parameters (instead of leaving the columns out completely) and want Sql Server to use its (pre-declared) DEFAULT value for one or more of the colums to which the parameter refers, then you must set the parameter value to null and not to DbNull.

So indeed if an abstraction layer wants to take advantage of the NpgsqlParameter<T> it would also have to account for null and Nullable<T> semantics as well.

roji commented 4 years ago

Thanks for that info @mldisibio.

pha3z commented 3 years ago

Why doesn't IDbParameter offer SetString(), SetBoolean(), SetInt32(), ...etc implementations? IDataReader works this way, so why isn't the pattern repeated to IDbParameter??

Seems like a very easy-to-understand solution that would be backwards compatible with existing ado.net architecture.

roji commented 3 years ago

@pha3z one big issue with that pattern, is that it creates a closed set of types that can be used with DbParameters - but different databases have very different supported types (e.g. PostgreSQL has a type for IP addresses). This is one reason why DbDataReader has a generic GetFieldValue which can return any type. A good solution for writing would do the same here.

pha3z commented 3 years ago

@roji

Yes but the initial reasoning stated for DbParameterT was to avoid boxing. To my knowledge, boxing is only relevant on value types. PostgresSQL defines an IP Address type, sure, but is that relevant? The dotnet type is what we're talking about passing into the parameter. What Npgsql does to translate the Dotnet type into the DbType is under-the-hood. Are there any database providers that support directly assigning dotnet value types other than the standard primitive types?

I'm not saying that a generic is something to be opposed. Clearly its the right solution. But this issue has been backlogged for years. If the reason its backlogged is due to high refactoring requirements to support generics, then it seems like it would make more sense to add strongly typed primitives to cover 99% of use cases.

Please correct me if I am somehow completely off about the boxing issue.

cincuranet commented 3 years ago

Are there any database providers that support directly assigning dotnet value types other than the standard primitive types?

Yes. For example FirebirdClient.

Wraith2 commented 3 years ago

If the reason its backlogged is due to high refactoring requirements to support generics,

I've looked at it several times from the SqlClient perspective and the problem isn't simply adding a generic parameter it's all the internal logic that relies on the object type for coercion between the source type and destination type.

roji commented 3 years ago

Are there any database providers that support directly assigning dotnet value types other than the standard primitive types?

You're right the boxing is relevant on value types and that IPAddress is a value type, but we shouldn't make the assumption that there won't be a value type supported by some ADO.NET provider out there. Some good examples are the Npgsql support for NodaTime types (instead of the built-in BCL DateTime/TimeSpan), which are structs.

But this issue has been backlogged for years. If the reason its backlogged is due to high refactoring requirements to support generics, then it seems like it would make more sense to add strongly typed primitives to cover 99% of use cases.

It's true that this has been backlogged for quite a while, but I do think we should do it right when we do it - I'm still hoping to get around to it for .NET 6. Default interface methods should actually simplify the design (see https://github.com/dotnet/runtime/issues/17446#issuecomment-316044473).

pha3z commented 3 years ago

Ok all that makes sense. Thank you for the consideration! 👍

roji commented 1 year ago

Another reason for doing this is for the user to be able to provide a requested CLR type for an output parameter; see https://github.com/dotnet/SqlClient/issues/2092 for more details.

DmitryMak commented 4 months ago

On the read side of things, there's DbDataReader.GetFieldValue<T>() allowing users to generically read values. This allows ADO.NET providers to provide an implementation that doesn't box value types - users can read ints without needless heap allocations.

Regarding the read side, GetFieldValue<T> is usually implemented with return (T) (object) so it must be boxing as well.

SQLite and MySql

public override T GetFieldValue<T>(int ordinal)
{
    ...
    if (typeof(T) == typeof(int))
        return (T) (object) GetInt32(ordinal);  <-- will generate IL: box [System.Runtime]System.Int32
    ...
    if (typeof(T) == typeof(string))
        return (T) (object) GetString(ordinal);
    ...
    return (T) GetValue(ordinal);
}

Do you have examples of DbDataReader.GetFieldValue<T> implementations that don't box?

roji commented 4 months ago

Do you have examples of DbDataReader.GetFieldValue implementations that don't box?

Yes, take a look at Npgsql.

En3Tho commented 4 months ago

@DmitryMak When running on CLR, patterns you've shown are recognized by jit compiler and boxing is automatically removed.

It will still box on other runtimes like mono for example.

If you've hit a pattern like that (generic method with typeof checks then box-unbox sequence) that still boxes then you should file an issue I guess.

Wraith2 commented 4 months ago

if (typeof(T) == typeof(int))
        return (T) (object) GetInt32(ordinal);

note that this is a special pattern that the jit can recognise and optimize away. In the case where T is int casts would resolve to (int)(object)intValue and the jit can recognise the pattern will box and then unbox and optimize to just intValue. In language terms it will box. When you run the code and the asm is produced it will probably not box (depending on debug/release, tier0/tier1, (d)PGO, etc).

DmitryMak commented 4 months ago

@En3Tho and @Wraith2 is the (T)(object)val optimization documented anywhere? I only found it here

dotnet / runtime