Closed Grantim closed 1 month ago
Can you upload the bad wheel here? I can take a look at it and see what the problem is.
For the bad wheel, I traced the failure to the DLL tbb12-58efd192759dacc003e42fc3899b5ee8.dll
, and I ran some experiments to try to figure out what was wrong with it.
I noticed that if I use ctypes.windll.kernel32.LoadLibraryExW()
to load tbb12-58efd192759dacc003e42fc3899b5ee8.dll
directly, it fails. If I load msvcp140-f1c90b395d7d901af7e16ef487237278.dll
first, then loading tbb12-58efd192759dacc003e42fc3899b5ee8.dll
succeeds. My first suspicion was that there was a bug in the name-mangling mechanism which was corrupting tbb12.dll
and causing it to be unloadable directly. I analyzed tbb12-58efd192759dacc003e42fc3899b5ee8.dll
but could not find anything wrong with it.
Then I wrote a program in C which uses LoadLibraryExW()
to load tbb12-58efd192759dacc003e42fc3899b5ee8.dll
, and I found that it sometimes failed and sometimes succeeded depending on the location of the C program and the flags that were passed to LoadLibraryExW()
. As a control, I tried loading msvcp140-f1c90b395d7d901af7e16ef487237278.dll
in the same manner, which succeeded in every scenario. This told me that tbb12-58efd192759dacc003e42fc3899b5ee8.dll
is directly loadable in certain scenarios.
I was unable to find an obvious rule for why loading tbb12-58efd192759dacc003e42fc3899b5ee8.dll
would succeed in some cases and fail in others. The only explanation I have is that at least one of the Microsoft Visual C++ DLLs such as vcruntime140.dll
relies on process-global state that can be broken if two versions of that DLL are loaded in the same process. That is, if vcruntime140-hash1.dll
and vcruntime140-hash2.dll
are loaded into the same process, then the DLL loader no longer functions correctly. The fact that no one else has reported a similar issue and that past versions of your wheel worked fine would suggest that it is rare for the DLL loader to be broken in this way, and you happened to stumble upon the very specific set of circumstances that triggers the issue. Although I don't know for certain that this is what's causing your issue, but I think it's the most logical explanation.
I think, then, the solution would be for delvewheel
to avoid name-mangling the Microsoft Visual C++ DLLs. I will work on making this change.
Thanks for help! We will try to investigate further
Thanks! I managed to reproduce the issue outside Python with the following C++ program:
#include <windows.h>
int main()
{
AddDllDirectory( LR"(C:\Program Files\Python311)" );
AddDllDirectory( LR"(C:\Users\user4\AppData\Roaming\Python\Python311\site-packages\meshlib.libs)" );
HMODULE h = LoadLibraryExA( R"(C:\Users\user4\AppData\Roaming\Python\Python311\site-packages\meshlib\mrmeshpy.pyd)", NULL, LOAD_LIBRARY_SEARCH_DEFAULT_DIRS );
FreeLibrary(h);
}
It looks 100% reproducible with the same error:
344c:d070 @ 1248997843 - LdrpProcessWork - ERROR: Unable to load DLL: "msvcp140-f1c90b395d7d901af7e16ef487237278.dll", Parent Module: "C:\Users\user4\AppData\Roaming\Python\Python311\site-packages\meshlib.libs\tbb12-58efd192759dacc003e42fc3899b5ee8.dll", Status: 0xc0000135
And additional finding is that if one copies the files
msvcp140-f1c90b395d7d901af7e16ef487237278.dll
vcruntime140-26a92e4fb4b73ddc824fe6616b0ea281.dll
vcruntime140_1-23cea287d52749969d25554c25715a49.dll
from C:\Users\user4\AppData\Roaming\Python\Python311\site-packages\meshlib.libs\
to C:\Program Files\Python311\
then the importing of "bad" wheel (from meshlib import mrmeshpy as mm
) and above C++ program both succeed.
So it looks that the problem is not in name mangling but in something else. What do you think?
I noticed a similar thing during my investigation, where I tried copying DLLs into different locations and seeing the result. But I could not determine what was special about tbb12-58efd192759dacc003e42fc3899b5ee8.dll
. There are other DLLs that depend on msvcp140-f1c90b395d7d901af7e16ef487237278.dll
, but I was able to load them with no problem.
And when I compared the good wheel with the bad wheel, I could not find a clear explanation for why one worked and the other did not.
Thanks!
The only explanation I have is that at least one of the Microsoft Visual C++ DLLs such as vcruntime140.dll relies on process-global state that can be broken if two versions of that DLL are loaded in the same process.
To check this, I compiled above C++ small application with static linking of VS runtime, so no second vcruntime140.dll
is loaded in the process before the error appears (and it still appears): DebugLog-51420.txt. So I think we can exclude this version.
Just found one more thing: using this tool printed PE headers for tbb libraries good_pe.txt bad_pe.txt
Bad on before delvewheel bad_ref.txt
This line seems strange
That's an issue with the PE Explorer tool. A lot of PE file analysis tools expect the DLL names to be in the same PE file section and fail to display the DLL information when that's not the case. DLL names are usually in the same PE section. However, the name-mangling mechanism can cause the DLL names to be placed into different PE sections. Many DLLs that have been name-mangled with delvewheel have this characteristic, and I have never seen this characteristic cause problems before. For instance, spdlog-9a52397535aaa1e58bd985a9de6110d0.dll
loads fine, yet PE Explorer shows missing DLL names as well:
DLL NAME :
Characteristics : 0x42089508
OriginalFirstThunk : 0x42089508
TimeDateStamp : 0x4204E3D0
ForwarderChain : 0x4204E3D0
FirstThunk : 0x42074698
Imported Functions :
__C_specific_handler
__current_exception_context
__current_exception
memset
memmove
memcpy
memcmp
memchr
_CxxThrowException
__std_exception_destroy
__std_exception_copy
_purecall
__std_type_info_destroy_list
__std_terminate
DLL NAME :
Characteristics : 0x42089580
OriginalFirstThunk : 0x42089580
TimeDateStamp : 0x4204E3D0
ForwarderChain : 0x4204E3D0
FirstThunk : 0x42074710
Imported Functions :
__CxxFrameHandler4
I have discovered something interesting, though when I debug the following code.
#include <stdio.h>
#include <Windows.h>
int main() {
if (!AddDllDirectory(L"C:\\Users\\adang\\Downloads\\venv\\Lib\\site-packages\\meshlib.libs")) {
printf("AddDllDirectory() failed\n");
}
SetLastError(0);
HMODULE h;
h = LoadLibraryExW(L"C:\\Users\\adang\\Downloads\\venv\\Lib\\site-packages\\meshlib.libs\\tbb12-58efd192759dacc003e42fc3899b5ee8.dll", NULL, LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR);
DWORD err = GetLastError();
printf("%p; %lu\n", h, err);
}
When I try to load tbb12-58efd192759dacc003e42fc3899b5ee8.dll
, the DLL search path is wrong and is using the PATH
environment variable instead. It would seem that the LOAD_LIBRARY_SEARCH_DEFAULT_DIRS | LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR
flags are being ignored for some reason.
0ff4:19e0 @ 04991312 - LdrLoadDll - ENTER: DLL name: C:\Users\adang\Downloads\venv\Lib\site-packages\meshlib.libs\tbb12-58efd192759dacc003e42fc3899b5ee8.dll
0ff4:19e0 @ 04991312 - LdrpLoadDllInternal - ENTER: DLL name: C:\Users\adang\Downloads\venv\Lib\site-packages\meshlib.libs\tbb12-58efd192759dacc003e42fc3899b5ee8.dll
0ff4:19e0 @ 04991312 - LdrpResolveDllName - ENTER: DLL name: C:\Users\adang\Downloads\venv\Lib\site-packages\meshlib.libs\tbb12-58efd192759dacc003e42fc3899b5ee8.dll
0ff4:19e0 @ 04991312 - LdrpResolveDllName - RETURN: Status: 0x00000000
0ff4:19e0 @ 04991312 - LdrpMinimalMapModule - ENTER: DLL name: C:\Users\adang\Downloads\venv\Lib\site-packages\meshlib.libs\tbb12-58efd192759dacc003e42fc3899b5ee8.dll
ModLoad: 00007ff8`55080000 00007ff8`550d7000 C:\Users\adang\Downloads\venv\Lib\site-packages\meshlib.libs\tbb12-58efd192759dacc003e42fc3899b5ee8.dll
0ff4:19e0 @ 04991312 - LdrpMinimalMapModule - RETURN: Status: 0x00000000
0ff4:19e0 @ 04991312 - LdrpFindKnownDll - ENTER: DLL name: msvcp140-f1c90b395d7d901af7e16ef487237278.dll
0ff4:19e0 @ 04991312 - LdrpFindKnownDll - RETURN: Status: 0xc0000135
0ff4:4e10 @ 04991312 - LdrpSearchPath - ENTER: DLL name: msvcp140-f1c90b395d7d901af7e16ef487237278.dll
0ff4:19e0 @ 04991312 - LdrpFindKnownDll - ENTER: DLL name: vcruntime140-26a92e4fb4b73ddc824fe6616b0ea281.dll
0ff4:19e0 @ 04991312 - LdrpFindKnownDll - RETURN: Status: 0xc0000135
0ff4:19e0 @ 04991312 - LdrpFindKnownDll - ENTER: DLL name: vcruntime140_1-23cea287d52749969d25554c25715a49.dll
0ff4:5284 @ 04991312 - LdrpSearchPath - ENTER: DLL name: vcruntime140-26a92e4fb4b73ddc824fe6616b0ea281.dll
0ff4:19e0 @ 04991328 - LdrpFindKnownDll - RETURN: Status: 0xc0000135
0ff4:25c4 @ 04991328 - LdrpSearchPath - ENTER: DLL name: vcruntime140_1-23cea287d52749969d25554c25715a49.dll
0ff4:19e0 @ 04991328 - LdrpPreprocessDllName - INFO: DLL api-ms-win-crt-heap-l1-1-0.dll was redirected to C:\WINDOWS\SYSTEM32\ucrtbase.dll by API set
0ff4:19e0 @ 04991328 - LdrpPreprocessDllName - INFO: DLL api-ms-win-crt-runtime-l1-1-0.dll was redirected to C:\WINDOWS\SYSTEM32\ucrtbase.dll by API set
0ff4:19e0 @ 04991328 - LdrpPreprocessDllName - INFO: DLL api-ms-win-crt-string-l1-1-0.dll was redirected to C:\WINDOWS\SYSTEM32\ucrtbase.dll by API set
0ff4:19e0 @ 04991328 - LdrpPreprocessDllName - INFO: DLL api-ms-win-crt-environment-l1-1-0.dll was redirected to C:\WINDOWS\SYSTEM32\ucrtbase.dll by API set
0ff4:19e0 @ 04991328 - LdrpPreprocessDllName - INFO: DLL api-ms-win-crt-stdio-l1-1-0.dll was redirected to C:\WINDOWS\SYSTEM32\ucrtbase.dll by API set
0ff4:4e10 @ 04991312 - LdrpComputeLazyDllPath - INFO: DLL search path computed: C:\Users\adang\source\repos\cppsandbox\x64\Release;C:\WINDOWS\SYSTEM32;C:\WINDOWS\system;C:\WINDOWS;.;C:\Program Files (x86)\Windows Kits\10\Debuggers\x64;C:\Program Files\Python312\Scripts\;C:\Program Files\Python312\;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\Program Files\Microsoft VS Code\bin;C:\MinGW\bin\;C:\MinGW\msys\1.0;C:\MinGW\ms
Compare this the log for when I load fmt-d8b924ba1577612827210a349cbf8c6e.dll
, where the DLL search path is correct.
393c:5554 @ 06186234 - LdrLoadDll - ENTER: DLL name: C:\Users\adang\Downloads\venv\Lib\site-packages\meshlib.libs\fmt-d8b924ba1577612827210a349cbf8c6e.dll
393c:5554 @ 06186234 - LdrpLoadDllInternal - ENTER: DLL name: C:\Users\adang\Downloads\venv\Lib\site-packages\meshlib.libs\fmt-d8b924ba1577612827210a349cbf8c6e.dll
393c:5554 @ 06186234 - LdrpResolveDllName - ENTER: DLL name: C:\Users\adang\Downloads\venv\Lib\site-packages\meshlib.libs\fmt-d8b924ba1577612827210a349cbf8c6e.dll
393c:5554 @ 06186234 - LdrpResolveDllName - RETURN: Status: 0x00000000
393c:5554 @ 06186234 - LdrpMinimalMapModule - ENTER: DLL name: C:\Users\adang\Downloads\venv\Lib\site-packages\meshlib.libs\fmt-d8b924ba1577612827210a349cbf8c6e.dll
ModLoad: 00007ff8`c5a60000 00007ff8`c5a87000 C:\Users\adang\Downloads\venv\Lib\site-packages\meshlib.libs\fmt-d8b924ba1577612827210a349cbf8c6e.dll
393c:5554 @ 06186234 - LdrpMinimalMapModule - RETURN: Status: 0x00000000
393c:5554 @ 06186234 - LdrpFindKnownDll - ENTER: DLL name: msvcp140-f1c90b395d7d901af7e16ef487237278.dll
393c:5554 @ 06186234 - LdrpFindKnownDll - RETURN: Status: 0xc0000135
393c:1ea0 @ 06186234 - LdrpSearchPath - ENTER: DLL name: msvcp140-f1c90b395d7d901af7e16ef487237278.dll
393c:5554 @ 06186234 - LdrpFindKnownDll - ENTER: DLL name: vcruntime140-26a92e4fb4b73ddc824fe6616b0ea281.dll
393c:5554 @ 06186234 - LdrpFindKnownDll - RETURN: Status: 0xc0000135
393c:526c @ 06186234 - LdrpSearchPath - ENTER: DLL name: vcruntime140-26a92e4fb4b73ddc824fe6616b0ea281.dll
393c:5554 @ 06186234 - LdrpFindKnownDll - ENTER: DLL name: vcruntime140_1-23cea287d52749969d25554c25715a49.dll
393c:1ea0 @ 06186234 - LdrpComputeLazyDllPath - INFO: DLL search path computed: C:\Users\adang\Downloads\venv\Lib\site-packages\meshlib.libs;C:\Users\adang\source\repos\cppsandbox\x64\Release;C:\Users\adang\Downloads\venv\Lib\site-packages\meshlib.libs;C:\WINDOWS\SYSTEM32
I have no idea what would cause the DLL search path to be wrong with tbb12-58efd192759dacc003e42fc3899b5ee8.dll
.
Looks like we found the reason: Here in changes of onetbb https://github.com/oneapi-src/oneTBB/compare/v2021.11.0...v2021.13.0#diff-01fb64c473f1e70ed71d0d37e62daa12d13f2bd487df508e550a80f8ead82f9bR38
They added /DEPENDENTLOADFLAG:0x2000
that means: LOAD_LIBRARY_SAFE_CURRENT_DIRS
If this value is used, loading a DLL for execution from the current directory is only allowed if it is under a directory in the Safe load list.
Based on this
When the operating system resolves the statically linked imports of a module, it uses the default search order. Use the /DEPENDENTLOADFLAG option to specify a load_flags value that changes the search path used to resolve these imports. On supported operating systems, it changes the static import resolution search order, similar to what LoadLibraryEx does when using LOAD_LIBRARY_SEARCH parameters. For information on the search order set by load_flags, see Search order using LOAD_LIBRARY_SEARCH flags.
Thanks! I've learned something new today; this is the first time I've heard of the /DEPENDENTLOADFLAG
option. Let me see whether this situation can be better handled in delvewheel
, either by editing tbb12.dll
to override the /DEPENDENTLOADFLAG
option or outputting a warning if a problematic /DEPENDENTLOADFLAG
option is detected.
Thanks!
I am building wheel package for pypi, that usually is OK, but after updating some thirdparties I've got an error:
(I am using python 3.11.9)
Lets say I had good whl and now I have bad whl.
Both was repaired with
delvewheel repair
and both have same dlls includingmsvcp140-hash.dll
,vcruntime140-hash.dll
,vcruntime140_1-hash.dll
(instead of hash there was some hash described here (Name Mangling section of delvewheel documentation))So good whl works fine and bad one does not work. After hours of investigation I found out that using one of these options makes bad one work
--no-dll "msvcp140.dll;vcruntime140_1.dll;vcruntime140.dll"
or--no-mangle "msvcp140.dll;vcruntime140_1.dll;vcruntime140.dll"
.It seems to me that there are some conflicts with original dlls in the system that produce that error: 1)
--no-dll
excludesmsvcp140-hash.dll
,vcruntime140-hash.dll
,vcruntime140_1-hash.dll
from my package, so it just uses system dlls 2)--no-mangle
includesmsvcp140.dll
,vcruntime140.dll
,vcruntime140_1.dll
to my package without hash that makes system leave only one dll linked, that also prevents conflictInteresting thing that good whl does not have this problem (it has same dependencies but some of them are older versions)
Have anyone faced something like this and what is the best way to fix it? (Would be nice if someone could explain why this issue happened, "dll conflicts" is just a theory based on symptoms)
Thanks!
Link to SO question