Concrete initial v3 API change proposal

mgravell commented 11 months ago

Dapper has grown organically over time, and not all decisions were good ones. We have worked hard to try to avoid runtime or compile-time breaks, but this means we have been constrained by those poor decisions.

I would like to get to work on shipping a V2->V3 transition that fixes this, and get some breaks over and done with. As discussed here, some feature work may be shunted to the AOT project, but we need the right API for things to be effective.

What are we trying to solve?

The key problems in the API in V2 are:

the signed/unsigned split is a nuisance - there should be one and only one Dapper if we can achieve that without breaking folks
starting from the interfaces (IDbConnection etc) rather than the concrete base types (DbConnection) was a mistake
CommandDefinition was well intentioned, but again: a mistake - we should expose cancellation via a standard CancellationToken parameter
the shared API between buffered and unbuffered data is suboptimal (at best)
we should add support for DbDataSource as a root primitive

Contraints on changes:

where possible, we should avoid runtime breaks, and minimize compile-time breaks

Planned implementation

The first big change is: strong name the library and retire Dapper.StrongName; bad call.

My proposal for the code, effectively, is to retire the existing API but leave it in place (so: no missing methods) by removing the this modifier (so they no longer resolve as extension methods), and create a new set of extension methods - probably on a new static class that expose the preferred API, meaning: the shape we want, and the most generalized version only (no hive of forwarded methods with increasing numbers of parameters reflecting our incremental understanding of the required API). Code that is already compiled will continue unchanged using the old methods (the missing this is irrelevant and doesn't break the API). When code is recompiled, it'll resolve to the new methods.

For example:

  public static class SqlMapper {
-     public static IEnumerable<T> Query<T>(this IDbConnection conection, string sql, object? parameters = null, bool buffered = true, ...);
+     public static IEnumerable<T> Query<T>(IDbConnection conection, string sql, object? parameters = null, bool buffered = true, ...);
  }
+  public static class DbConnectionExtensions {
+    public static List<T> Query<T>(DbConnection conection, string sql, object? parameters = null, ...);
+    public static IEnumerable<T> QueryUnbuffered<T>(DbConnection conection, string sql, object? parameters = null, ...);
+    [Obsolete("This call to AsList is no longer required")] // highlight redundant Query<T>(...).AsList() usage as a warning
+    public static List<T> AsList<T>(this List<T> source) => source;
+ }

Key changes:

root type changes from IDbConnection to DbConnection
buffered is now explicit on the method signature rather than a parameter (see the existing QueryUnbufferedAsync)
async methods will gain CancellationToken parameters

However:

all code will continue to run without recompile (ignoring the signed/unsigned fun)
most code will require zero changes to compile (unless they are explicitly specifying buffered, in which case either a: remove buffered: false, or b: switch to QueryUnbuffered)
the change to return List<T> in the default buffered case makes reality explicit; in theory this could change how calling code operates, but this is considered to be a positive thing - for example, foreach will be more efficient if it knows it is starting from List<T> due to duck-typing

We'll need to list out the full proposal set, but before we get too far: I want to gather feedback on the overall shape of this suggestion.

Tornhoof commented 11 months ago

all code will continue to run without recompile (ignoring the signed/unsigned fun)

I applaude the removal of one of the nugets and just strong name it, I'm just not sure why you think it would not require recompilation? Either my Solution uses the strongname nuget and I need to remove it in my Sln and rebuild, or it uses the other one, which was previously not strong named and now it is and that's a breaking change, according to MS.

the change to return List

I like that too, but with the recompilation required due to breaking change stuff and many running WarningsAsErrors, I'd expect that you could simply fully Obsolete the method (with error) and not only as Warning.

I would simply forget about recompiling and at most focus on having few impacted manual changes, e.g. provide codefixer for your AsList() and/or one for CancellationToken support.

mgravell commented 11 months ago

I'm just not sure why you think it would not require recompilation?

I don't think that. I'm saying we can't get around that. Adding the strong name is a hard break, but you're right that this may cause "just works without recompile" problems. To be further considered.

chipmunkofdoom2 commented 11 months ago

I think these proposals look good. I'm looking forward to a simplified internal structure of Dapper. There have been a few times where I've had interest in contributing, but the codebase is a little tricky to navigate. It's usually just easier to use a workaround in my code.

mgravell commented 11 months ago

@chipmunkofdoom2 the internal complexity (in part due to the runtime metaprogramming) is a large part of why I want to push new feature work into AOT. The AOT path is much, much simpler.

MiloszKrajewski commented 11 months ago

Curiosity: why "starting from the interfaces (IDbConnection etc) rather than the concrete base types (DbConnection) was a mistake"? For people using Dapper to query their own concreate DbConnection it most likley won't be a problem, but I have few places where I have my own extensions on top of Dapper, and I pass IDbConnection already.

sebastienros commented 11 months ago

My feedback is that if I was to update any of my projects to v3, or one of the dependencies did (rare case, and would have to expose Dapper types), I wouldn't mind "compiling" it again. I'd rather see Dapper take full advantage of a breaking change window and provide tools to migrate the code (code analyzers/fixers, as previously suggested).

mgravell commented 11 months ago

why "starting from the interfaces (IDbConnection etc) rather than the concrete base types (DbConnection) was a mistake"?

great question; short version: because interfaces are not extensible (at least, before .NET 6?), a ton of stuff we absolutely depend on (that was added after the initial cut of IDbConnecetion) simply cannot be accessed without using the base-class (where they are available); this includes:

everything async, noting that async is a key and ever-increasing use-case
the generic field access API (needed to fetch things like DateOnly)
the batch API (we're not currently using this in Dapper core, but a punt exists in AOT)
probably a few others I'm forgetting

The short version is: we're heavily constrained without this. For async, we currently do a type check and push things to DbConnection internally.

amoerie commented 11 months ago

I'm not sure how many people do this, but I've always created a mini abstraction layer around Dapper in every project, because there are a few benefits to doing so:

ability to enforce that queries are awaited before the connection is disposed
ability to add logging / telemetry in a single place
ability to mock when I don't want to use a real database in unit tests
...

So concerning these breaking API changes, this is my personal opinion:

break all you want, I only have to make changes in my little abstraction anyway
while you're making breaking changes anyway, maybe consider my above "pain points" too. But only if it can be done while keeping the library as "focused" as it is today. Dapper's simplicity is a major selling point.

mgravell commented 11 months ago

@Tornhoof

I applaude the removal of one of the nugets and just strong name it, I'm just not sure why you think it would not require recompilation?

Right; so, I absolutely don't expect this to work in .NET Framework. I did some checking on .NET 6, though, and it didn't seem to present any problem whatsoever there. Maybe there is a middle-ground here where:

on .NET Framework, we leave everything alone
on .NET (modern):
- strong-name Dapper
- change Dapper.StrongName to ref Dapper and be an empty dll with just [assembly:TypeForwardedTo] metadata
- optional: new rule in Dapper.Advisor that recommends switching if it detects you're using .StrongName on .NET (modern)

Or maybe we just leave the strong name alone entirely and just go "meh" 🤷 - on the basis that not breaking people is more important, and it isn't really causing us daily pain, it is just an eyesore.

Emphasis: I don't care about eyesores in the .NET Framework build - if we need to keep .StrongName for that, well... honestly, that's just "stuff we haven't quite killed yet" - I don't see any need to rug-pull .NET Framework folks: they've already suffered enough.

Tornhoof commented 11 months ago

I like the middle ground approach, especially the part with the rule in advisor. You're way too nice to the netfx crowd.

richardcox13 commented 11 months ago

What is the expectation in terms of dependencies? Currently NetStandard 2.0 and .NET FX 4.5.1 are supported, what would this clean up, being a major version update, be a good time to allow use of newer capabilities?

Eg. rather than

public static IEnumerable<T> QueryUnbuffered<T>(DbConnection conection, string sql, object? parameters = null, ...);

allow incremental return of data be fully exposed with

public static IAsyncEnumerable<T> QueryUnbuffered<T>(DbConnection connection, string sql, object? parameters = null, ...);

As with @amoerie I (and the teams I lead) typically use wrappers for diagnostics (sql, parameters (optionally), timing) so seeing the API not changing too much is really valuable.

mgravell commented 11 months ago

What is the expectation in terms of dependencies? Currently NetStandard 2.0 and .NET FX 4.5.1 are supported, what would this clean up, being a major version update, be a good time to allow use of newer capabilities?

No strong need to drop anything - we are using newer capabilities when available already. If it was hurting me, I'd readily drop up to about net472, but... meh

allow incremental return of data be fully exposed with

that already exists, it is (and would remain) QueryUnbufferedAsync

DavidBoike commented 11 months ago

Why try so hard to avoid runtime breaks in a major version? Are people really just doing a DLL replacement on a major version without recompiling?

NickCraver commented 11 months ago

Why try so hard to avoid runtime breaks in a major version? Are people really just doing a DLL replacement on a major version without recompiling?

In short, yes. Because you can have many assemblies which have Dapper as a downstream dependency and you end up with only one Dapper.dll in the deploy (without shenanigans), and all those intermediate assemblies aren't being recompiled in your build, only when those libraries are each updated and released.

mgravell commented 11 months ago

We're in the position where it is never just SomeApp => Dapper - there are lots of things that ref Dapper as a wrapper, both public 3rd party libs, and private in-house libs at random companies. If we have the chain

SomeApp
- Dapper (some direct access)
- Public 3rd party A (relatively prompt updates)
- Dapper
- Public 3rd party B (now abandoned, no new versions ever)
- Dapper
- Private internal C (build server has broken, nobody knows why, the expert is on maternity leave)
- Private internal D (maintained by another division, and their priority this quarter is something else)
  - Dapper

If we retain compat, then if we rev and SomeApp takes that new version directly or transitively: no problem, they get the new code. But: if we break, there is a lot of work to figure out a: which component caused the update, and b: unpick the supply chain such that A, B, C, and D are all compatible with the updated bits. And emphasis: the author of B got fed up with IT and now grows watermelons off-grid; the private keys are lost to time - so even forking it would involve a hard break of B.

OK, I'm being a little melodramatic, but: not much. I've seen many things not too dissimilar to this.

The problem with running a popular library is that people use it and depend on it.

Greybird commented 11 months ago

Because you can have many assemblies which have Dapper as a downstream dependency and you end up with only one Dapper.dll in the deploy

The problem with running a popular library is that people use it and depend on it.

Both these remarks made me wonder if these constraints could be relaxed somehow. I ended up wondering if you already considered a scenario where the breaking change is published on a new package name / new dll file name (let's say Dapper.v3) / new namespace ?

SomeApp
- Dapper.v3 (some direct access)
- Public 3rd party A (relatively prompt updates)
  - Dapper.v3
- Public 3rd party B (now abandoned, no new versions ever)
  - Dapper
- Private internal C (build server has broken, nobody knows why, the expert is on maternity leave)
  - Private internal D (maintained by another division, and their priority this quarter is something else)
    - Dapper

Deprecation features of nuget could allow to hint people for package changes, if considered appropriate.

Apart from the fact that this is not very satisfying from a brand point of view, I suspect there are most probably cases where this would cause issues, so please forgive me if this approach is obviously doomed to failure.

NickCraver commented 11 months ago

@Greybird Unfortunately that's not really maintainable because we end up with a new package every major version and every consumer of every version doesn't even know there's an upgrade - and you get say Dapper.dll, Dapper.v3.dll, Dapper.v4.dll, etc. and unless you're changing the namespace each time creating n caches/overlaps/behaviors at best, we run into ambiguity errors for all namespaces/types/extensions that overlap.

Playing nice in the same running application across versions is the ultimate thing here, no matter which level along the way conflicts are happening at. In a lot of the proposals here for workarounds, we'd just be moving where the conflicts happen, and it'd less intuitive (if not sometimes impossible) for users to fix compared to any other major version break from a package change. I'd say if we're going to break, at least do it in a way that everyone else does, so that there's a known set of issues/constraints/fixes users have rather than creating new sets of those.

mgravell commented 11 months ago

Very incomplete first stab at API overhaul linked above; no impl yet - IMI only the shipped.txt and unshipped.txt files are useful right now

ashishoffline commented 8 months ago

In my 6 years of experience, I haven't used EF/Core often, mainly due to performance concerns, project requirements, or the complexity of my SQL queries. Instead, I've primarily relied on ADO.NET by creating a generic SqlHelper with DbProviderFactory.

Over the past 2 years, I've incorporated Dapper into some projects. Here are my suggestions, many of which align with @mgravell points. Please disregard any suggestions that are already implemented, as I haven't extensively used Dapper:

Consider creating extension methods for DbConnection (as opposed to just IDbConnection) to take advantage of additional ADO.NET features not available in IDbConnection.

Provide async versions of ExecuteReader, ExecuteNonQuery, and ExecuteScalar with proper cancellation support.

Instead of returning IEnumerable, it might be more efficient to return IReadOnlyList. In practice, most developers use List or arrays, and using IReadOnlyList would improve performance when iterating with foreach and provide index access.

For parameterized queries, it could be beneficial to accept an IEnumerable<KeyValuePair<string, object>> instead of just "object." This change aligns with common practice, as developers often use dictionaries for parameters.

Explore the possibility of replacing meta-programming and IL code with AOT/source code generation for mapping DbDataReader to types. This approach could reduce memory usage, which could be significant in larger projects with numerous queries.

mgravell commented 8 months ago

Appreciate the feedback!

Consider creating extension methods for DbConnection (as opposed to just IDbConnection) to take advantage of additional ADO.NET features not available in IDbConnection.

Changing the signate: already in the VNext spike

Taking advantage of the features: already in VCurrent (we cast as needed)

Provide async versions of ExecuteReader, ExecuteNonQuery, and ExecuteScalar with proper cancellation support.

I believe this is already in the VNext spike

Instead of returning IEnumerable, it might be more efficient to return IReadOnlyList. In practice, most developers use List or arrays, and using IReadOnlyList would improve performance when iterating with foreach and provide index access.

Already in the VNext spike, although we went List<T> - better enumeration path

For parameterized queries, it could be beneficial to accept an IEnumerable<KeyValuePair<string, object>> instead of just "object." This change aligns with common practice, as developers often use dictionaries for parameters.

We have DynamicParameters for that, although we could also support dictionary directly

Explore the possibility of replacing meta-programming and IL code with AOT/source code generation for mapping DbDataReader to types. This approach could reduce memory usage, which could be significant in larger projects with numerous queries.

See https://aot.dapperlib.dev/gettingstarted

So: we're not in much disagreement here

DapperLib / Dapper

Concrete initial v3 API change proposal #1976

What are we trying to solve?

Planned implementation