dotnet / SqlClient

Microsoft.Data.SqlClient provides database connectivity to SQL Server for .NET applications.
MIT License
838 stars 279 forks source link

Feature | Split Azure dependent functionality in a separate NuGet Package #1108

Open SimonCropp opened 3 years ago

SimonCropp commented 3 years ago

why does Microsoft.Data.SqlClient v3 reference Azure.Identity?

AraHaan commented 9 months ago

Ah I see.

m-gallesio commented 9 months ago

@SimonCropp Would creating a new Microsoft.Data.SqlClient.Core package that removes Azure.Identity be a viable path?

This would work, but I think a better approach would be:

  1. Keeping Microsoft.Data.SqlClient as the core package
  2. Publish specific packages for other connectors, e.g. Microsoft.Data.SqlClient.Azure

Of course this would break existing dependencies and should be done in a major version change.

(Just noticed this was already said in https://github.com/dotnet/SqlClient/pull/2247#pullrequestreview-1768896873)

ahmed-abdelrazek commented 8 months ago

another reason to remove it https://github.com/advisories/GHSA-5mfx-4wcx-rv27 any news on the new update?

ErikEJ commented 8 months ago

@ahmed-abdelrazek Already Updated in 5.2 preview 4

SimonCropp commented 8 months ago

@ErikEJ shouldnt a Remote Code Execution Vulnerability with a 8.8/10 severity be a justification for an immediate patch release?

ErikEJ commented 8 months ago

@SimonCropp it is not my call. I think a 5.1 patch is planned for January. Also, as explained earlier by @David-Engel this package does not expose the vulnerable apis to the user.

SimonCropp commented 8 months ago

@ErikEJ thanks for the consideration and clarification

AraHaan commented 8 months ago

I think just by using the package itself that it should be updated or removed ASAP as it could result in the vulnerability to be executed and abused even if it is not directly used by the application itself just to prevent security problems down the line like this.

ErikEJ commented 7 months ago

@AraHaan @SimonCropp 5.1.4 and 5.2 preview 5 have been released with multiple CVE fixes included

altso commented 7 months ago

5.1.4 references System.IdentityModel.Tokens.Jwt@6.24.0 and Microsoft.IdentityModel.JsonWebTokens@6.24.0 that are in scope of https://github.com/advisories/GHSA-59j7-ghrg-fj52. It's really hard to justify having to deal with that when you only need to connect to SQL Server.

SimonCropp commented 7 months ago

ok this is now blocking us from using DTC on net8

JRahnama commented 7 months ago

@SimonCropp this seems by design in the newer version of MSAL. You can read here.

If your application targets UWP or net-windows (version-dependent Target Framework Moniker for Windows), WAM is included in the MSAL.NET package.

R2D221 commented 7 months ago

I think the real problem here is, why should I care about whether the new version of MSAL includes a desktop dependency by design if I'm not using MSAL whatsover?

My train of thought here is:

  1. I want to connect to a SQL Server database.
  2. System.Data.SqlClient is no longer maintained. Microsoft recommends we use Microsoft.Data.SqlClient instead.
  3. Microsoft.Data.SqlClient includes MSAL for some reason that falls outside my use case, and I can't opt out of it.
  4. MSAL includes a desktop dependency by design.
  5. At this point my choices are 1) to go back to an unsupported dependency, or 2) ... no, that's it, because as mentioned in previous comments, there are environments where installing the desktop runtime is impossible.
Dean-NC commented 7 months ago

I upgraded from 5.1.2 to 5.1.4. It worked locally because I have .NET Framework. I didn't know about that change, so deploying to my server broke because it doesn't have Framework. This is crazy. I'm going to stay on 5.1.3 until a Core/base client is ready that doesn't have Azure, identity, msal, etc.

SimonCropp commented 7 months ago

@Dean-NC just be aware that both v5.1.5 and 5.1.4 include fixes for CVEs you may need to consider

Dean-NC commented 7 months ago

@SimonCropp Thanks, I appreciate the heads-up and I saw that. But I'm not going to put the framework on a server that doesn't need it, that's ridiculous. Nothing against framework, I used it for many years, but I'm willing to take the security risk.

0xced commented 6 months ago

I did an experiment to see how much the Azure and MSAL libraries are weighing and it turns out that's a lot! Here are the results after removing the Azure and MSAL dependencies while still having a working SqlClient on .NET 8 (Windows x64): 9.5 MB vs 4.3 MB.

This looks like another pretty solid argument in favor of splitting the core driver and the Microsoft identity platform authentication support in different packages!

Full source code and explanations are available at https://github.com/0xced/ChiseledSqlClient

@SimonCropp and @R2D221 You might want to try this technique, I think it might solve your issues.

ErikEJ commented 6 months ago

I already confirmed this in my draft PR here: https://github.com/dotnet/SqlClient/pull/2247#issuecomment-1934092233

I will have a look at your repo

SimonCropp commented 6 months ago

@0xced nice research and writeup. well done

virzak commented 6 months ago

@0xced I tried it out and it is great.

With the exception that anyone who wants to use this approach must

This PR replaces that project reference approach with package reference approach. Currently the package (version 100.0.0) is stored in MyGet, but can move to NuGet. It is called "Chiseled.Microsoft.Identity.Client", though the name can be anything (I don't know what chiseled means).

The DLL inside is Microsoft.Identity.Client.dll and because it has a higher version than the original it will be picked instead of the original.

Let me know your thoughts.

MichaelKetting commented 5 months ago

@SimonCropp and @ErikEJ Thank you for the work you are doing on #2247 . I only scanned the discussion here and in the PR where you did a deep dive with @David-Engel and the last comment on Feb 7th (https://github.com/dotnet/SqlClient/pull/2247#issuecomment-1934709067) makes me very optimisitic on the state of the SQL ecosystem.

Having a "fat" SqlClient is a blocker for us, too, on using the updated library and I was looking at the same options including PRs and forking. If there is something I can do to help with the PR, please feel free to reach out directly!

~Michael

PS: I posted here instead of the PR to catch people up :)

ernstscheithauer commented 5 months ago

+1 for splitting this. Thanks for the work everyone.

0xced commented 5 months ago

I just released a new NuGet package, Chisel, that lets you remove unwanted dependencies from your dotnet projects.

Please have a look at the README, it explains everything you need to know for removing all the Azure and Microsoft.Identity dependencies from Microsoft.Data.SqlClient.

This workaround can be used while the real solution (splitting packages) is being worked on.

AraHaan commented 5 months ago

I just released a new NuGet package, Chisel, that lets you remove unwanted dependencies from your dotnet projects.

Please have a look at the README, it explains everything you need to know for removing all the Azure and Microsoft.Identity dependencies from Microsoft.Data.SqlClient.

This workaround can be used while the real solution (splitting packages) is being worked on.

Sounds like an unsafe workaround if something was to unexpectedly happen like a bug causing it to try to use the azure bits that gets removed. The real fix with that would be to look at all the places in the IL code within said dependency that is from the dependencies you do not want and get those parts of said functions/etc stripped out as well. Not to mention in a way that does not break the entire code flow as well.

Thorium commented 5 months ago

The amount of recent security vulnerabilities with Azure.Identity, combined with more visible security reporting (e.g. VS), has made it so that you have to constantly update your non-Azure software SQL-drivers to get rid of warnings. Nonsense.

Edit: Already the thought that someone could compromise your non-Azure software with some Azure.Identity security issue (by-passing auth?) is scary.

Maybe back to System.Data.SqlClient if you do care security?

AraHaan commented 5 months ago

Maybe back to System.Data.SqlClient if you do care security?

Sometimes I have no choice but to sadly agree if said thing worked in all TFM's which it sadly does not as most of it been either: obsoleted, or gutted to be stubs.

damianh commented 5 months ago

Using this library blows up the size of my AWS lambda functions just to talk to a database.

Microsoft.Data.SqlClient transitive dependencies vs Npgsql:

image

Npgql release build, 1.52MB:

image

SqlClient release build, 14MB (almost 10x bigger!!) :

image

To be brutally honest, it's astonishing that this issue is nearing 3 years. Transitive dependencies should be kept to an absolute minimum and apply extension packages "lighting-up" functionality if the user desires it. I have zero justifiable reason to be packaging Azure.Identity to my AWS applications.

Also now your release schedule of a core library will be dictated and driven by changes to your transitive dependencies (and their CVE's...).

It would be worth taking a moment to study the Npgsql libraries and how things are approached there.

Edit: Just noticed some experimentation here https://github.com/dotnet/SqlClient/pull/2247

SimonCropp commented 5 months ago

as the highest upvoted issue, can this perhaps get some love?

KalleOlaviNiemitalo commented 5 months ago

@damianh, your screen shots show quite a few language-specific subdirectories. SatelliteResourceLanguages could help with those, but I don't know what fraction of the disk space is in them.

dggmez commented 5 months ago

I didn't know about this dependency. This stuff could be in The Daily WTF.

damianh commented 5 months ago

@damianh, your screen shots show quite a few language-specific subdirectories. SatelliteResourceLanguages could help with those, but I don't know what fraction of the disk space is in them.

Yeah I can trim those, however that is extra work and they really shouldn't be there in the first place.

ErikEJ commented 5 months ago

As far as i know, the Azure SQL database can be used just like normal on-premise database, without Azure.Identity dll. 

Sure, but only with user name and password, which is not really the way to do this these days.

R2D221 commented 4 months ago

@0xced

I just released a new NuGet package, Chisel, that lets you remove unwanted dependencies from your dotnet projects.

I just tested it and it seemed to work so far. Definitely an alternative while the real solution is worked on.

avin3sh commented 2 months ago

I appreciate that the discussions for removing direct dependency on Azure.Identity are happening and it will be a while before the packages can be split.

Until that happens, could we please bump Azure.Identity dependency to >= 1.12.0 so that the negative effects of accidental breaking change introduced in Azure.Identity 1.10.4 can be undone ? Because of this issue you can not use SqlClient >= 5.1.4 without also unnecessarily introducing dependency on .NET Desktop Runtime! https://github.com/dotnet/SqlClient/issues/2309 should not have been closed without addressing this key problem 😞 Developers are being forced to downgrade to vulnerable version (5.1.2) because of this.

There is also whole different conversation about how changes flow to EFCore.SqlServer, that depends on SqlClient, which is again unfortunate but probably not relevant for maintainers of SqlClient.

edwardneal commented 2 weeks ago

The recent deprecation of System.Data.SqlClient (#2778) has re-raised this issue, and I think it makes it more important.

If an organisation needed to be able to connect to SQL Server, they'd previously be able to use either S.D.S or Microsoft.Data.SqlClient. Although this was the preferred library, any organisation using vulnerability scanners would have discovered CVEs in the Azure libraries, found that this was pulled into the dependency chain via M.D.S, and had the option to fall back to S.D.S as another supported library.

That choice will no longer exist - those organisations will have to choose between using a now-unsupported library, or facing an extra background hum of vulnerabilities which need to be accounted for. Even in a "happy path" of something connected to a local SQL Server via Windows authentication, there's still overhead of having to account for these CVEs, and we're relying on security/compliance teams being willing to accept the developers' word that it's not in use and being kept up-to-date if requirements change...

In the absence of trim compatibility, Chisel is a good way to fix this - but it's got its own restrictions because it has no way to know the dependency chain of a class library's consumers. This can be a problem in applications which use Clean Architecture as a point of reference.

It's always going to be difficult to split the packages now that they're in wider use, but the experience for developers migrating to M.D.S as a result of the S.D.S deprecation seems like it's going to be pretty poor too.

The consensus in the final comments of #2247 seemed to be to have Microsoft.Data.SqlClient and Microsoft.Data.SqlClient.Azure. Could we start to release these packages to NuGet as part of the v6 release? I'm not sure whether we'd want the M.D.S.Azure and M.D.S packages to be identical, or whether it'd be better for M.D.S.Azure to start life as nothing more than a shell which references M.D.S. The goal would be to allow us to start recommending that migrators from System.Data.SqlClient which use Azure functionality should use Microsoft.Data.SqlClient.Azure, and migrators which don't should use Microsoft.Data.SqlClient. Developers would hopefully also start shifting to the appropriate package organically, making this slightly easier to tackle when we get to it in the future.

On a tentative side point: if both packages were published and M.D.S.Azure had a different application name in the connection string, perhaps the Azure SQL team might be able to review logins with a matching application name and the appropriate authentication method, then notify the tenant owners of an upcoming breaking change...

ErikEJ commented 2 weeks ago

@edwardneal Very interesting proposal, that should pave the path for a removal of Azure dependencies in version 7!

So what you are proposing is essentially a new package initially with the exact same binaries as M.D.S. today.

"M.D.S.Azure to start life as nothing more than a shell which references M.D.S. "

Curious, how would you implement this with a NuGet package?

edwardneal commented 2 weeks ago

@ErikEJ That's correct, yes. I'm not completely sure whether the extra package should contain the same binaries as M.D.S, but the fact that the package exists means that the S.D.S migration guidance can be issued, developers can start to be informed about the need to switch packages, etc.

Curious, how would you implement this with a NuGet package?

I'm thinking of a NuGet metapackage - so the package wouldn't contain any files, but would define a dependency on M.D.S (and possibly Azure.Identity.) One example of this type of package is SpecFlow.NUnit.Runners.

We might not want it to consist strictly of dependencies though - I could see some value in having a public interface (perhaps a static method with an empty body) which developers are expected to call in order to register the Azure authentication/AlwaysEncrypted components. Later revisions of v6 could start to implement this as any required refactoring inside M.D.S is completed.

Essentially, v6 would contain the public-facing interfaces needed to start asking developers to migrate and v7 could move the dependency itself.

ErikEJ commented 2 weeks ago

I could see some value in having a public interface (perhaps a static method with an empty body) which developers are expected to call in order to register the Azure authentication/AlwaysEncrypted components.

I would assume that adopting the .Azure package would not require any code changes for existing users. (Only a package switch)

edwardneal commented 2 weeks ago

That's a fair assumption, and in that case I think it'd be pretty simple to set up the metapackage and then to populate it with the Azure-referencing binaries as a later release. My point of reference comes from packages like Npgsql.OpenTelemetry, which reference the underlying package and only directly contain the OTel-specific shims. I can see why we'd want to treat M.D.S differently because of its history though.

roji commented 2 weeks ago

So what you are proposing is essentially a new package initially with the exact same binaries as M.D.S. today.

Just want to note that I was imagining Microsoft.Data.SqlClient.Azure as a plugin into SqlClient, so that users would reference both packages; this is in contrast to this plan, where M.D.S.Azure would be a copy-paste of M.D.S, with the Azure stuff added in.

The problem with the "copy-paste" model is that it doesn't scale well; if there's some need for some additional functionality which also requires some other external dependency, you'd not be able to do this trick again. I'd recommend trying to think about exposing an authentication plugin API, which external packages (such as M.D.S.Azure) would be able to hook into in order to do their work. In the ideal design, this extensibility would be exposed on SqlDataSourceBuilder - and M.D.S.Azure would provide a simple extension method over SqlDataSourceBuilder to do the appropriate configuration - but some global/static way to register the plugin would also be necessary for anyone not using SqlDataSourceBuilder.

Some might object that a plugin model is more of a breaking change: current users would have to have change their program to add the opt-in. But ultimately you'll be asking users to change their package references anyway (from M.D.S to M.D.S.Azure) so I'm not sure that's a meaningful difference.

ErikEJ commented 2 weeks ago

@roji I understand your thoughts, but I think the switch to .Azure should only require a package change. New external dependencies are implemented as plug ins today, like the KeyVault provider.

mtaylorfsmb commented 2 weeks ago

I haven't thought this all the way through but it seems like the copy-paste approach is going to cause problems as well. Here's the scenario I wonder about. There is a package A that relies on MDSv1 today. A new MDSv2 package is created containing a copy of the binaries (not a metapackage). A project depends on package A and also, either directly or indirectly, takes a dependency on MDSv2. Now a project has packages that contain the same binary names but potentially different versions. Which one actually gets copied to the output and therefore loaded at runtime? AFAIK package ordering is undefined so there is no way to control which package wins. Since NuGet doesn't look at the assemblies within a package then it wouldn't see a conflict either.

The obvious solution is to update package A to rely on MDSv2 but that isn't reliable since a package may not be updated anymore, might be a transitive dependency from another package you can't control, etc. Not sure if this is an actual issue and how to work around it.

The alternative of a metapackage seems to solve that issue but we, where I work, don't recommend metapackages. They introduce more problems then they solve. Transitive security vulnerabilities are a common problem and metapackages that aren't updated cause warnings. The only workaround is to take a direct dependency on the package which defeats the purpose of metapackages. I think I heard there is some option you can set in your PackageReference to allow transitives to update but AFAIK this isn't exposed or supported by VS and would require editing the project file and ensuring it remains set if you update it later. Could be wrong though. The other problem with metapackages is that you really cannot ever get rid of the original package, since you rely on it. All in all, metapackages are hacky to me but do help resolve the upgrade legacy users problem, at least initially.

ErikEJ commented 2 weeks ago

@mtaylorfsmb the metapackage is a point in time thing, it will eventually have its own binaries.

mtaylorfsmb commented 2 weeks ago

@ErikEJ Correct me if I'm wrong but there will still come a day when you're back to the first scenario I mentioned. Code that relies on MDS.Azure (metapackage) which relies on MDSv1 will updated to MDS.Azure.vNext (shipping the binaries directly). But projects that rely on MDSv1 transitively (for whatever reason) will suddenly have a conflict when they update MDS.Azure.vNext because 2 different packages now have the same binaries. It is just kicking the problem down the road, or maybe I don't understand how this is going to work ultimately.

edwardneal commented 2 weeks ago

I'd recommend trying to think about exposing an authentication plugin API, which external packages (such as M.D.S.Azure) would be able to hook into in order to do their work. In the ideal design, this extensibility would be exposed on SqlDataSourceBuilder - and M.D.S.Azure would provide a simple extension method over SqlDataSourceBuilder to do the appropriate configuration - but some global/static way to register the plugin would also be necessary for anyone not using SqlDataSourceBuilder.

@roji a lot of this work has technically already been done. There's a public SqlAuthenticationProvider interface, backed by SqlAuthenticationProviderManager. There might be a need to refactor the AD authentication provider, but that's not a change to public API surface. An additional piece of API surface would be needed for the AlwaysEncrypted enclave providers to modify GetEnclaveProvider, but this could use a similar public-facing design.

I thought a little about what that'd require, whether M.D.S should treat M.D.S.Azure uniquely because it was embedded inside the core library, etc. It's a bit of a moot point though - I didn't think it'd be ready in time for an early enough preview of M.D.S v6, which is where the idea around a metapackage and possibly an empty piece of public API surface comes from.

This does all depend on whether we can make transitive dependencies work as they need to though. From checking NuGet's dependency resolution rules @mtaylorfsmb, I think there'd only be one problem to consider?

We can safely expect that M.D.S.Azure wouldn't always be a metapackage, that during its time as a metapackage its version would be kept in lockstep in M.D.S, and it would explicitly pin its reference to exactly the same version of M.D.S. This would mean that we're not going to run into transitive security vulnerabilities from the metapackage - it'd be published from the same repository at the same time as M.D.S.

The scenarios I have in mind for the dependencies are:

  1. Project references M.D.S.Azure v6.1 and Package B. Package B references M.D.S >= v6.0. M.D.S v6.1 will be used because it's the lowest version which satisfies both dependencies.
  2. Project references M.D.S.Azure >= v6.1 and Package B. Package B references M.D.S v6.0. This'd be a new kind of package conflict, since M.D.S would no longer be a direct dependency of the project (and would thus no longer override Package B's dependencies.)
  3. Project references Package A and Package B. Package A references M.D.S.Azure v6.1 and Package B references M.D.S v6.0. This isn't a new package conflict - it'd need to be resolved in the normal way.

I'm not pleased with scenario 2. Even in that scenario though, if a developer migrates to M.D.S.Azure, sees this problem and adds another extra reference to M.D.S, they've still put themselves into a non-breaking position for v7.0. We'd need to document that, so they don't immediately revert back to referencing M.D.S exclusively. I could also see it being frustrating for downstream libraries referencing M.D.S.

The really "interesting" point comes when we switch from a metapackage to something with real content. In scenario 2, a plugin route's actually got a pretty simple exit ramp: we'd bump M.D.S and M.D.S.Azure to v7, loosen M.D.S.Azure's dependencies to reference M.D.S >= v7, and the scenario becomes irrelevant. If there's a second copy of the M.D.S binaries in the M.D.S.Azure package, I'm not sure which binaries would be used, or what might happen in more complicated package dependency trees with multiple versions of M.D.S and M.D.S.Azure. If that made it through the compiler, I wouldn't be too surprised if we found that the application code was using M.D.S.Azure, but that the libraries referencing M.D.S were using M.D.S.

While not a strong preference, I'd personally prefer the plugin approach in the long run because I think it'd make stepping M.D.S.Azure from a metapackage to a real package simpler.

thompson-tomo commented 2 weeks ago

I also think a plugin approach is the correct approach in the long term but understand the timing pressurws.

Is there a way that we could perhaps implement the plugin method but only sets a variable. Then when we pass in an azure connection string a warning is logged if that variable is not set?

roji commented 2 weeks ago

@ErikEJ @edwardneal to make sure we're talking about the same things, my comments below are about the proposal of (eventually) having a copy-paste, i.e. both M.D.S and M.D.S.Azure containing e.g. a SqlConnection type, and without M.D.S.Azure referencing M.D.S.

In that scenario, what would e.g. the EF SQL Server provider reference? Either:

  1. EF references M.D.S, in which case EF users can't connect to Azure - obvious non-starter.
  2. EF references M.D.S.Azure, in which case all EF users get the Azure dependencies again, regardless of whether they use Azure or not. That's exactly what this change is supposed to fix, so again, not good.
  3. EF now has to publish two providers, mirroring the SqlClient split: Microsoft.EntityFrameworkCore.SqlServer and M.EF.SqlServer.Azure. That's really something we should try to avoid; plus, this same problem gets pushed up once again for any library/plugin/extensions out there that wants to work with the EF SQL Server provider.

This is related to the difficulties @mtaylorfsmb pointed out above... People would very easily find themselves with both SqlClient packages somewhere in the dependency graph, and at that point you have two versions of SqlConnection and things simply become unmanageable.

In contrast, if M.D.S.Azure is simply a plugin which only contains Azure-related auth stuff, and references M.D.S, then EF (and any other library) would simply reference M.D.S, and the user would decide whether to also reference M.D.S.Azure.

@roji I understand your thoughts, but I think the switch to .Azure should only require a package change. New external dependencies are implemented as plug ins today, like the KeyVault provider.

I understand and appreciate the desire to make this transition as painless as possible... But note that this whole issue (splitting Azure.Identity out of the base M.D.S) has been held back up to now because of (legitimate) backwards compats concerns. Now that we have a general willingness to explore a breaking change here, I really hope we don't produce a problematic, sub-standard solution again for the same reasons... If we're already breaking (via the package reference), IMHO might as well do things really right even if it means requiring users to paste another line of code somewhere...

ErikEJ commented 2 weeks ago

@roji Got it! And then provide an actionable error message with a link to good docs when attempting to use Azure auth with the "base" package...

edwardneal commented 2 weeks ago

That sounds sensible to me too. I've seen #2823 and it looks good to me, thanks for opening that. If the SqlClient team are happy with the design, it'd be good to publish it as part of the next preview of v6.0 so we can start the split

roji commented 2 weeks ago

And then provide an actionable error message with a link to good docs when attempting to use Azure auth with the "base" package...

Yes, exactly! As long as you provide crystal clear fail-fast behavior with an informative exception pointing to a good page explaining what to do to (switch to M.D.S.Azure, add this in your code), I really think it's a very reasonable situation... When this is possible, it's the "best kind" of breaking change (compared to breaking changes where the behavior silently changes or whatever). Users will still complain (some will complain no matter what), but the majority will simply do the change quickly and move on.