dotnet / SqlClient

Microsoft.Data.SqlClient provides database connectivity to SQL Server for .NET applications.
MIT License
842 stars 281 forks source link

Cannot unload a collectible AssemblyLoadContext if System.Data.SqlClient is used #414

Closed stasgaonkar closed 3 years ago

stasgaonkar commented 4 years ago

The Assembly we want to load (and unload) needs the ability to connect to a database for which we use Dapper. Dapper internally makes use of System.Data.SqlClient. We find that when we create such an assembly, then we are unable to Unload the collectible AssemblyLoadContext.

The sample source code is available at: https://github.com/stasgaonkar/DynamicLoading

Having followed the steps mentioned in https://docs.microsoft.com/en-us/dotnet/standard/assembly/unloadability#, the following is the output of gcroot command,

0:000> !dumpheap -type LoaderAllocator
         Address               MT     Size
000002022c46e1f0 00007fff74999410       48     
000002022c46e260 00007fff74999538       24     

Statistics:
              MT    Count    TotalSize Class Name
00007fff74999538        1           24 System.Reflection.LoaderAllocatorScout
00007fff74999410        1           48 System.Reflection.LoaderAllocator
Total 2 objects
0:000> !gcroot -all 0x000002022c46e1f0
HandleTable:
    000002022A9D12A0 (strong handle)
    -> 000002022C4E8C48 System.Data.SqlClient.TdsParserStateObjectNative
    -> 000002022C46E1F0 System.Reflection.LoaderAllocator

    000002022A9D12A8 (strong handle)
    -> 000002022C4E00F0 System.Data.SqlClient.TdsParserStateObjectNative
    -> 000002022C46E1F0 System.Reflection.LoaderAllocator

    000002022A9D12E0 (strong handle)
    -> 000002022C4B38C0 System.Data.SqlClient.TdsParserStateObjectNative
    -> 000002022C46E1F0 System.Reflection.LoaderAllocator

    000002022A9D15F8 (pinned handle)
    -> 000002023C461038 System.Object[]
    -> 000002022C475768 System.Threading.TimerQueue[]
    -> 000002022C475840 System.Threading.TimerQueue
    -> 000002022C4ABE88 System.Threading.TimerQueueTimer
    -> 000002022C475708 System.Threading.TimerQueueTimer
    -> 000002022C475580 System.Threading.TimerCallback
    -> 000002022C474D60 System.Data.SqlClient.SqlConnectionFactory
    -> 000002022C46E1F0 System.Reflection.LoaderAllocator

Found 4 roots.

So it seems that the TdsParserStateObjectNative is the culprit. What we have also noticed is that there are some "unmanaged/native DLL" that are used (sni.dll) internally by System.Data.SqlClient, possibly from within TdsParserStateObjectNative. Not sure if this is causing any issues.

Is something that we are attempting to do, incorrect?

janvorli commented 4 years ago

This is an issue in the System.Data.SqlClient. One thing is the TdsParserStateNativeObject that you've mentioned and the other is the timer callback (the last root in your gcroot output, my guess is that it is the _pruningTimer member of the DbConectionFactory, which is the base class of the SqlConnectionFactory). The System.Data.SqlClient will need to be modified to handle the AssemblyLoadContext.Unload event and perform cleanup to become unloadable. The System.Data.SqlClient already has some cleanup code registered to AppDomain.Unload, so it seems that it could possibly be reused for AssemblyLoadContext.Unload handling.

stasgaonkar commented 4 years ago

thanks @janvorli - is this something that needs to be handled within library or can we do it outside. If this can be done from outside, could you point me to the "cleanup code" that you are referring?

janvorli commented 4 years ago

@stasgaonkar it is an interesting idea. You might be able to solve it by installing a handler for the AssemblyLoadContext.Unload in your code running inside of your AssemblyLoadContext and invoking the System.Data.SqlClient.SqlDependencyPerAppDomainDispatcher.UnloadEventHandler via reflection from there. I am not sure if that's going to be sufficient, but it seems it is worth trying. See https://github.com/dotnet/samples/blob/master/core/tutorials/Unloading/Plugin/PluginClass.cs for an example on how to install the AssemblyLoadContext.Unload handler.

AlbertoCe commented 4 years ago

Hi,

I tried to put this code into the unloading method of the assemblycontext but with no luck

private static void OnPluginUnloadingRequested(AssemblyLoadContext obj)
{
            var assembly = obj.Assemblies.FirstOrDefault(x => x.FullName.Contains("Microsoft.Data.SqlClient", StringComparison.InvariantCultureIgnoreCase));
            var dispatcher = assembly.GetType("Microsoft.Data.SqlClient.SqlDependencyPerAppDomainDispatcher");
            var value = dispatcher.GetField("SingletonInstance", BindingFlags.NonPublic | BindingFlags.Static).GetValue(null);
            var method = dispatcher.GetMethod("UnloadEventHandler", BindingFlags.NonPublic | BindingFlags.Instance);
            method.Invoke(value, new object[] { null, null });
        }

@stasgaonkar have you gone any further on this ?

xgnxs commented 4 years ago

@AlbertoCe or @stasgaonkar, did either of you manage to figure out a solution? I'm having the same problem, where once I create a new SqlConnection (using System.Data.SqlClient) the plugin assembly I'm trying to create can no longer be unloaded. If I remove that line, I can unload the assembly just fine, although the plugin is useless without database access. I tried Microsoft.Data.SqlClient as well and had the same issue.

@janvorli do you have any suggestions? I have an AssemblyLoadContext.Unload event handler just like in the PluginClass.cs example, I just need to know what to put in it.

AlbertoCe commented 4 years ago

Hi @xgnxs Unfortunately I have not made progress with this problem. I had to give up using netcore 3.1 and switch to the 4.8 framework. I don't think it is easily solved nor when they will be able to solve it.

Let me know if you make any progress.

grant77 commented 4 years ago

This worked for me:

var sqlClientAssembly = alc.Assemblies.SingleOrDefault(x => x.GetName().Name == "Microsoft.Data.SqlClient");
var sqlConnectionFactoryType = sqlClientAssembly?.GetType("Microsoft.Data.SqlClient.SqlConnectionFactory");
if (sqlConnectionFactoryType != null)
{
    var singletonInstance = sqlConnectionFactoryType.InvokeMember("SingletonInstance", BindingFlags.Public | BindingFlags.Static | BindingFlags.GetField, null, null, null) !;
    var baseType = sqlConnectionFactoryType.BaseType!;
    baseType.InvokeMember("ClearAllPools", BindingFlags.InvokeMethod, null, singletonInstance, null);
    var pruningTimer = (Timer)baseType.InvokeMember("_pruningTimer", BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.GetField, null, singletonInstance, null) !;
    pruningTimer.Dispose();
}
cheenamalhotra commented 4 years ago

@stasgaonkar

Could you try solution posted by @grant77 if you're still facing this issue?

Benjiinator commented 3 years ago

@cheenamalhotra I am running into this same issue in .Net 5.0 using Microsoft.Data.SqlClient v2.1.1, the solution @grant77 provided does not appear to solve the issue for me.

Edit: as soon as i call the following my assembly becomes unloadable. using (var sqlConn = new SqlConnection(connString))

cheenamalhotra commented 3 years ago

Could you provide us a repro app to investigate?

Benjiinator commented 3 years ago

@cheenamalhotra i have created a repro app here: Assembly Unloading with Microsoft.Data.SqlClient

Wraith2 commented 3 years ago

Draft PR to get that basebones example working is at https://github.com/dotnet/SqlClient/pull/913

@Benjiinator can you try the nuget package produced by the CI builds and add in whatever other functionality you need from SqlClient so we can see if there are other things that end up rooting the assembly?

Benjiinator commented 3 years ago

@Wraith2 So far it appears to unload fine with the CI build. The Microsoft.Data.SqlClient.SNI.dll is however still locked in the filesystem even after unloading. I am attempting to delete the file so that i can update my application.

Wraith2 commented 3 years ago

Hmm, that's an unmanaged root then and I'm not sure how to track that down, @janvorli any hints?

janvorli commented 3 years ago

@Wraith2 Unmanaged dlls used via DllImport cannot be ever freed, that's by design. I was trying to enable unloading them when I was working on the unloadable AssemblyLoadContext, but in the discussion on the respective PR, there were concerns about allowing freeing of native libraries in general. Many native libraries are not designed to be freeable and freeing them could result in memory leaks, crashes, deadlocks and other nasty stuff. See https://github.com/dotnet/coreclr/pull/22274 for details. The main issue is that if we freed the native libraries automatically, people that are not even aware that a 3rd party component uses native library may start hitting these issues that are difficult to diagnose in general when they use the component in an unloadable context. So the way to free dlls where the author intends them to be freed is to use the NativeLibrary class (https://docs.microsoft.com/en-us/dotnet/api/system.runtime.interopservices.nativelibrary?view=net-5.0) to explicitly load/unload the library and get the exports instead of using the PInvoke. Then you have full control over the library lifetime.

Wraith2 commented 3 years ago

Tricky then. The alternative is to put sql client in managed mode using the AppContext switch "Switch.Microsoft.Data.SqlClient.UseManagedNetworkingOnWindows" which will force it to use managed sni instead of native sni.dll, the perf isn't quite as good but it's mostly the same. Just make sure you set it before calling any sql client functions.

I'll have a look at the loader code for sni.dll at some point but i remember there being some complex logic around loading the right version from subdirectories and i don't want to disturb it given how many support issues it causes.

Benjiinator commented 3 years ago

@Wraith2 using your CI build and setting "Switch.Microsoft.Data.SqlClient.UseManagedNetworkingOnWindows" did the trick for me. Thanks for your help @Wraith2, and thanks to you @janvorli for explaining the issue.

David-Engel commented 3 years ago

I'll have a look at the loader code for sni.dll at some point but i remember there being some complex logic around loading the right version from subdirectories and i don't want to disturb it given how many support issues it causes.

@Wraith2 It's not as complicated as you think. It just uses a standard DllImport to load the appropriate x86/x64 class and then DLL either in the SqlClient DLL's current folder or in the PATH. It would be awesome if we could reliably unload the SNI DLL when it's appropriate. I looked into a way to do it before, but didn't find a simple solution.

All here: https://github.com/dotnet/SqlClient/tree/master/src/Microsoft.Data.SqlClient/netfx/src/Microsoft/Data/Interop

Wraith2 commented 3 years ago

I'm not sure if it'll be worth doing. If we want to move to managed sni as default then enabling NativeLibrary loading of the pinvokes is a distraction. It'll all depend on performance gains and timescales and customer demand. If setting the app context is enough for most users then it might be best to leave it as-is.

Benjiinator commented 3 years ago

@Wraith2 Your solution has worked so far, but i have found out that unloading does not work when using CommandType.StoredProcedure.

Wraith2 commented 3 years ago

Interesting, can you provide a minimal repo for that and I'll dig into it again and see if i can identify what is rooting it for that case.

Benjiinator commented 3 years ago

@Wraith2 I have been unable to recreate it. currently investigating if anything else could be causing this behavior.

Benjiinator commented 3 years ago

@Wraith2 i seem to have resolved my issue now, but what caused it remains a mystery.

Wraith2 commented 3 years ago

Well if you find more all I'll need is the repro and I can investigate.

cheenamalhotra commented 3 years ago

Closing as PR #913 is now merged.

rafalkwol commented 2 years ago

@Wraith2 it seems the solution is working fine in 3.1 version of Microsoft.Data.SqlClient NuGet, but doesn't seem to be working for me in 4.0 and beyond. Any idea what change could be causing this? As otherwise I will have to use 3.1 in my plugin based application, but would prefer to stick to the newest version.

Wraith2 commented 2 years ago

If you can provide a standalone repro then I can debug into it and see what is rooting sqlclient. If you do manage to reproduce it please open a new issue so it can be tracked as an open item.

rafalkwol commented 2 years ago

@Wraith2 I made a separate issue for this with the source code provided.

1635