dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.16k stars 4.71k forks source link

SqlClient: Unable to load DLL 'sni.dll' #16905

Closed rowanmiller closed 4 years ago

rowanmiller commented 8 years ago

We are seeing a number of folks hitting the Unable to load DLL 'sni.dll' issue with the latest RC2 builds. Folks are hitting this when using EF Core but have confirmed that the issue exists when working directly with SQL Client too.

See https://github.com/aspnet/EntityFramework/issues/4953 for a good description from someone hitting the issue.

cc @mrmeek who filed the above issue

joshfree commented 8 years ago

@saurabh500 @YoungGah @corivera : blocking rc2 bug

saurabh500 commented 8 years ago

Looking into it.

natemcmaster commented 8 years ago

We are hitting it on aspnetci servers too. Win Server 2008 R2. I've attached a sample lockfile. project.lock.json

saurabh500 commented 8 years ago

@natemcmaster when did you start facing this issue?

cc @ericstj

natemcmaster commented 8 years ago

We just noticed it this morning. SqlClient 4.1.0-rc2-23931

ericstj commented 8 years ago

//cc @weshaggard I double checked the lock file and the packages and confirm that the DLL is correct bitness and references msvcrt.dll. @natemcmaster if you run depends on the SNI.dll on the machine that's failing it should show you what's missing. I'm reviewing the dependencies now against what I see in the project.lock.json to make sure we aren't missing some api-set dll.

ericstj commented 8 years ago

I double checked all the API set dependencies and they are satisfied. @schellap perhaps something wrong with how the host is setting up paths for PInvoke.

natemcmaster commented 8 years ago

This may also be part of it. From @BrennanConroy:

I see that it started as soon as we grabbed version 2202 of the CLI

schellap commented 8 years ago

@natemcmaster, can you point me to the project that reproes it.

BrennanConroy commented 8 years ago

https://github.com/aspnet/Diagnostics/blob/0a444088c9a7c5c6b4073c92104b48af734ef523/test/Microsoft.AspNetCore.Diagnostics.FunctionalTests/DatabaseErrorPageSampleTest.cs#L24

natemcmaster commented 8 years ago

I can consistently repo on a Win Server 2008 R2 box, but not my local machine. Contact me internally and I can point you to the right box.

git clone https://github.com/aspnet/Identity
cd Identity
dotnet restore --infer-runtimes
cd test/Microsoft.AspNetCore.Identity.EntityFrameworkCore.Test
dotnet test

cc @saurabh500

natemcmaster commented 8 years ago

Could this be another problem with missing a redist?

ericstj commented 8 years ago

That DLL compiles against msvcrt.dll which isn't the redist CRT: it should be part of the OS. If you try depends.exe on the dll on the target machine it will tell you if anything is missing.

saurabh500 commented 8 years ago

@mrmeek What OS are you running your app on ?

schellap commented 8 years ago

What is puzzling to me at this point is:

Property NATIVE_DLL_SEARCH_DIRECTORIES = C:\Users\asplab\.nuget\packages\Microsoft.DiaSymReader.Native\1.3.3\runtimes\w
in-x64\native;C:\Users\asplab\.nuget\packages\runtime.win7-x64.Microsoft.NETCore.DotNetHost\1.0.0-rc2-00001\runtimes\wi
n7-x64\native;C:\Users\asplab\.nuget\packages\runtime.win7-x64.Microsoft.NETCore.DotNetHostPolicy\1.0.0-rc2-00001\runti
mes\win7-x64\native;C:\Users\asplab\.nuget\packages\runtime.win7-x64.Microsoft.NETCore.DotNetHostResolver\1.0.0-rc2-000
01\runtimes\win7-x64\native;C:\Users\asplab\.nuget\packages\runtime.win7-x64.Microsoft.NETCore.Runtime.CoreCLR\1.0.2-rc
2-23931\runtimes\win7-x64\native;C:\Users\asplab\.nuget\packages\runtime.win7-x64.Microsoft.NETCore.Windows.ApiSets\1.0
.1-rc2-23931\runtimes\win7-x64\native;C:\Users\asplab\.nuget\packages\runtime.win7-x64.runtime.native.System.Data.SqlClient.sni\4.0.1-rc2-23931\runtimes\win7-x64\native;

The last one contains the dir where the sni.dll is present.

natemcmaster commented 8 years ago

I ran depends.exe with the folders listed in NATIVE_DLL_SEARCH_DIRECTORIES. The tool says it's still missing some modules. image

Also disclaimer: I'm new at the tool so the results may not be 100% accurate.

ericstj commented 8 years ago

We can double check that one of these is causing the actual load failure using procmon.

natemcmaster commented 8 years ago

Here is the procmon dump on dotnet.exe. Not sure which needle in this haystack to look for. Logfile.zip

ericstj commented 8 years ago

I think you need to monitor corehost.exe

natemcmaster commented 8 years ago

Could it the error be caused by one of these events? https://gist.github.com/natemcmaster/3349e792a62520cfb3c3a9297dd4b5e0 I pulled these from the full procmon dump for corehost: corehostexe_procmon.zip

ericstj commented 8 years ago

Looks like it, to troubleshoot try taking all the API set dlls and dropping them next to sni.dll and see if that fixes it. If it does, it seems to be an issue with the way the host is setting up the probing paths for native DLLs.

natemcmaster commented 8 years ago

Must be. The error went away when I moved sni.dll from it's nuget path to $(where dotnet)/shared/Microsoft.NETCore.App/1.0.0-rc2-23931/sni.dll.

ericstj commented 8 years ago

I suppose that's one workaround. I was able to validate my specific suggestion as well by dumping all the API sets into .nuget\packages\runtime.win7-x64.runtime.native.System.Data.SqlClient.sni\4.0.1-rc2-23931\runtimes\win7-x64\native and it also worked around the issue.

@schellap it looks like the NATIVE_DLL_SEARCH_DIRECTORIES aren't being honored when locating dependencies of native DLLs.

saurabh500 commented 8 years ago

Taking @ericstj 's approach futher, if I copy only the API-MS-Win-Core-StringAnsi-L1-1-0.dll to the .nuget\packages\runtime.win7-x64.runtime.native.System.Data.SqlClient.sni\4.0.1-rc2-23931\runtimes\win7-x64\native folder, the sni.dll loads properly.

ericstj commented 8 years ago

That's just because your WS2008R2 machine happened to have the other API sets centrally installed via https://support.microsoft.com/en-us/kb/2999226. That update only contains a subset of the api set DLLs that CoreFx depends on and it is not an explicit pre-req AFAIK. We need the host/CLR to fix this problem generically, otherwise native dependencies across packages are broken.

natemcmaster commented 8 years ago

We need the host/CLR to fix this problem generically, otherwise native dependencies across packages are broken.

Should this issue be moved to cli then? cc @schellap

schellap commented 8 years ago

Under the debugger,

04fc:065c @ 470113093 - LdrpLoadImportModule - ERROR: Loading DLL api-ms-win-core-stringansi-l1-1-0.dll from path <C:\Users\schellap\.nuget\packages\runtime.win7-x64.runtime.native.System.Data.SqlClient.sni\4.0.1-rc2-23931\runtimes\win7-x64\native;;C:\Windows\system32;C:\Windows\system;C:\Windows;.;c:\debuggers\amd64\winext\arcade;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\AegisTools\Bin;C:\Program Files\dotnet\> failed with status 0xc0000135
04fc:065c @ 470113093 - LdrpLoadImportModule - RETURN: Status: 0xc0000135
04fc:065c @ 470113093 - LdrpHandleOneOldFormatImportDescriptor - ERROR: Loading "??????????????????l????????????????l????????????????????????" from the import table of DLL "C:\Users\schellap\.nuget\packages\runtime.win7-x64.runtime.native.System.Data.SqlClient.sni\4.0.1-rc2-23931\runtimes\win7-x64\native\sni.dll" failed with status 0xc0000135 
ericstj commented 8 years ago

@schellap shouldn't that probing list match NATIVE_DLL_SEARCH_DIRECTORIES? or at least contain a superset?

schellap commented 8 years ago

@ericstj, why is this DLL not packaged with sxs sni.dll, if it needs it?

If this app were to work on win7, right now, where is it even installed?

NATIVE_DLL_SEARCH_DIRECTORIES is for CoreCLR, I don't think coreclr adds all native dlls to the windows load paths. Imagine how slow that would be.

ericstj commented 8 years ago

@schellap all of the corefx dlls, both managed and native, depend on the API set DLLs rather than directly on the OS implementation DLLs. We redistribute the API set DLLs downlevel in a single package and on newer OSes those are built in. You can think of this as just .NET Core providing a consistent native API surface across all the Windows OSes where we run (including phones).

Consider that this is really no different than managed DLLs depending on other managed DLLs in other packages and CoreCLR handles that just fine.

We cannot have the restriction that native dependencies must be packaged in the same package. That doesn't make any sense from a package author's perspective.

We chatted and I think the perf hit is theoretical. We should measure before we make design decisions based on that. Additionally the perf hit is largely just for dev time where we are running from the package cache. In a published app everything will be in a single directory. In the portable app we'll only have a few directories for each fallback RID.

To @natemcmaster's point, yes this is a CLI issue. We should move it. I'm not sure how to do that.

saurabh500 commented 8 years ago

@schellap Any idea on what changed and started causing this issue for Sni.dll in .Net CLI?

Are there other Native dlls in corefx which need be aware of the Native dependency which need to be packaged with them?

@ericstj My repro is on Windows 7 and I don't have the update you have mentioned. May be the API set were installed as part of another update.

schellap commented 8 years ago

Re: @ericstj, @gkhanna79 just wanted to inform you that we would have to add all native dir paths to the windows load paths, because in portable apps the layout would look like:

app-dir\runtimes\win7-x64\native\foo.dll
app-dir\runtimes\win81-x64\native\bar.dll

bar.dll depends on foo.dll.

@saurabh500, dotnet test always runs out of the nuget cache. You should see this only on lower OS, have you tested on win7 without apiset update installed before? Nothing else has changed, AFAIK.

gkhanna79 commented 8 years ago

Why would a win81 component depend upon a Win7 component? Or, are they two different components in two packages where Win81 package has a dependency on Win7 package containing foo.dll?

schellap commented 8 years ago

Correct. They are two packages where win81 has a dependency on win7 package.

gkhanna79 commented 8 years ago

Implication of running from nuget cache.

CC @weshaggard @piotrpMSFT

saurabh500 commented 8 years ago

@schellap I wanted to understand from EF's test perspective. I wanted to figure out why the tests started breaking now instead of earlier. I will follow up on this later.

schellap commented 8 years ago

The issue at hand is a DLL's static references can come from different packages and different RIDs and hence different directories.

An ugly hack for Windows is, we can simply modify process PATH (or do AddDllDirectory if coreclr can be fixed to do SEARCH_USER_DIRS) in Windows to account for the new directories that the host discovers which may contain native DLL references.

On Unix, is there such a solution? -- LD_LIBRARY_PATH can only be configured at process start. @ericstj you mentioned dnx solved this problem. Did the host preload all native binaries as discussed in this thread for non-Windows? @ellismg @janvorli https://github.com/dotnet/coreclr/issues/709

@davidfowl @anurse, this commit seems related and it preloads native binaries...

https://github.com/aspnet/dnx/pull/3060 https://github.com/aspnet/dnx/commit/b9f0af9512fb87f3af89974ff277cd509552be56#diff-e5a44140d1dcb175dca60a87ba6fc0d5R311

schellap commented 8 years ago

Cc @moozzyk

gkhanna79 commented 8 years ago

Thinking a bit more about this, this sounds more of a package composition implication. Shouldn't a native binary in a package be self-contained with its dependencies (unless they are installed in system paths) @davidfowl ?

schellap commented 8 years ago

@gkhanna79 even if it is self-contained within the package, for Windows this is okay but the unix dynamic loader is not going to know about the package paths and sub folders. It has a very specific probe order. See this:

http://man7.org/linux/man-pages/man3/dlopen.3.html

(ELF only) If the executable file for the calling program
           contains a DT_RPATH tag, and does not contain a DT_RUNPATH tag,
           then the directories listed in the DT_RPATH tag are searched.

       o   If, at the time that the program was started, the environment
           variable LD_LIBRARY_PATH was defined to contain a colon-separated
           list of directories, then these are searched.  (As a security
           measure, this variable is ignored for set-user-ID and set-group-
           ID programs.)

       o   (ELF only) If the executable file for the calling program
           contains a DT_RUNPATH tag, then the directories listed in that
           tag are searched.

       o   The cache file /etc/ld.so.cache (maintained by ldconfig(8)) is
           checked to see whether it contains an entry for filename.

       o   The directories /lib and /usr/lib are searched (in that order).
moozzyk commented 8 years ago

@schellap - in dnx we have a LoadContext and CoreClr calls us (LoadUnmanagedLibrary) each time they need to load a native lib. We know exact paths because we store them when we do package resolution when starting the app. I wrote about native libraries in ASP.NET rc1/dnx here: http://blog.3d-logic.com/2015/11/10/using-native-libraries-in-asp-net-5/

gkhanna79 commented 8 years ago

@moozzyk Thanks. Finding the location of the native binary to pinvoke is supported by the host today. They key issue is that if the native binary being invoked to depends upon another native binary, how does that dependency get looked up by the OS loader?

@davidfowl and I chatted about this as well. As your article calls out, there is no way to resolve that at runtime in a consistent manner for all platforms and requires the dependencies to be installed in shared locations of the OS where the OS loader can find them.

moozzyk commented 8 years ago

@gkhanna79 - I couldn't find a good way of loading dependencies that are not global and not explicitly called out in the code (DllImport) on non-windows.

joshfree commented 8 years ago

@schellap @gkhanna79 can this issue be moved to /dotnet/cli for rc2? Are you waiting for any further data from @saurabh500 (my assumption is "no").

gkhanna79 commented 8 years ago

This issue is same as https://github.com/dotnet/cli/issues/2267. @schellap Should we close this as a dup?

gkhanna79 commented 8 years ago

Closing as dup of https://github.com/dotnet/cli/issues/2267

foxjazz commented 7 years ago

This issue shouldn't be closed. asp.net 2.0 now has the issue.

danmoseley commented 7 years ago

@foxjazz have you got repro and setup steps, etc? This may well be unrelated to this old issue above, except in symptoms.

borjasanes commented 7 years ago

@foxjazz I solved it installing this NuGet package: runtime.native.System.Data.SqlClient.sni