JuliaPackaging / BinaryBuilder.jl

Binary Dependency Builder for Julia
https://binarybuilder.org
Other
391 stars 101 forks source link

Build LibraryProduct DAG #669

Open staticfloat opened 4 years ago

staticfloat commented 4 years ago

We need to inspect all LibraryProducts within a build, look at their dependencies, then do a breadth-first walk of those LibraryProducts when dlopen()'ing them in the __init__() method of JLL packages. This is important because on platforms such as Windows, where we don't have RPATHs, we may accidentally attempt to open a library that needs something else in its same directory, and it can't find it. :/

Alternative solutions are to push that directory onto the PATH or cd() to the lib directory before dlopen()ing (unsatisfiactory as some libraries may not be in that same directory) or to embed XML manifests into the .dll's as a part of the BB audit process (Still not sure how to do this properly).

giordano commented 4 years ago

I fear that opening in the right order may not be sufficient.

This problem first popped out in SuiteSparse_jll, whose latest releases are failing to load on Windows, with an error like the following:

ERROR: LoadError: InitError: could not load library "C:\Users\appveyor\.julia\artifacts\bc235f981126068329fbf3f9f58f57ae63827269\bin\libspqr.dll"
The specified procedure could not be found.

This likely happens because libspqr depends on:

% objdump -p libspqr.dll|grep "DLL Name"
        DLL Name: libgcc_s_seh-1.dll
        DLL Name: KERNEL32.dll
        DLL Name: msvcrt.dll
        DLL Name: libopenblas64_.dll
        DLL Name: libsuitesparseconfig.dll
        DLL Name: libcholmod.dll

and the last two dependencies can't be automatically be found by the linker.

In these commits I moved opening of libspqr at the end of the init, but this is still failing. However, I'm also printing the value of dllist() before trying to dlopen the library:

dllist() = ["C:\\julia\\bin\\julia.exe", "C:\\windows\\SYSTEM32\\ntdll.dll", "C:\\windows\\system32\\KERNEL32.DLL", "C:\\windows\\system32\\KERNELBASE.dll", "C:\\julia\\bin\\libjulia.dll", "C:\\julia\\bin\\libgcc_s_seh-1.dll", "C:\\windows\\system32\\msvcrt.dll", "C:\\julia\\bin\\libssp-0.dll", "C:\\julia\\bin\\libstdc++-6.dll", "C:\\windows\\system32\\ADVAPI32.dll", "C:\\windows\\SYSTEM32\\dbghelp.dll", "C:\\windows\\SYSTEM32\\IPHLPAPI.DLL", "C:\\windows\\system32\\PSAPI.DLL", "C:\\windows\\SYSTEM32\\Secur32.dll", "C:\\windows\\system32\\USER32.dll", "C:\\windows\\SYSTEM32\\USERENV.dll", "C:\\windows\\SYSTEM32\\WINMM.dll", "C:\\windows\\system32\\WS2_32.dll", "C:\\julia\\bin\\LLVM.dll", "C:\\julia\\bin\\libwinpthread-1.dll", "C:\\windows\\SYSTEM32\\sechost.dll", "C:\\windows\\system32\\RPCRT4.dll", "C:\\windows\\system32\\NSI.dll", "C:\\windows\\SYSTEM32\\WINNSI.DLL", "C:\\windows\\system32\\GDI32.dll", "C:\\windows\\SYSTEM32\\profapi.dll", "C:\\windows\\SYSTEM32\\WINMMBASE.dll", "C:\\windows\\system32\\ole32.dll", "C:\\windows\\system32\\SHELL32.dll", "C:\\windows\\system32\\SspiCli.dll", "C:\\windows\\SYSTEM32\\cfgmgr32.dll", "C:\\windows\\SYSTEM32\\DEVOBJ.dll", "C:\\windows\\SYSTEM32\\combase.dll", "C:\\windows\\system32\\SHLWAPI.dll", "C:\\windows\\SYSTEM32\\CRYPTSP.dll", "C:\\windows\\system32\\rsaenh.dll", "C:\\windows\\SYSTEM32\\bcrypt.dll", "C:\\windows\\SYSTEM32\\CRYPTBASE.dll", "C:\\windows\\SYSTEM32\\bcryptPrimitives.dll", "C:\\windows\\system32\\IMM32.DLL", "C:\\windows\\system32\\MSCTF.dll", "C:\\windows\\SYSTEM32\\powrprof.dll", "C:\\windows\\system32\\uxtheme.dll", "C:\\windows\\system32\\mswsock.dll", "C:\\julia\\lib\\julia\\sys.dll", "C:\\julia\\bin\\libpcre2-8.DLL", "C:\\julia\\bin\\libgmp.DLL", "C:\\julia\\bin\\libmpfr.DLL", "C:\\julia\\bin\\libgmp-10.dll", "C:\\julia\\bin\\libopenblas64_.DLL", "C:\\julia\\bin\\libgfortran-4.dll", "C:\\julia\\bin\\libquadmath-0.dll", "C:\\julia\\bin\\libcholmod.DLL", "C:\\julia\\bin\\libcamd.dll", "C:\\julia\\bin\\libccolamd.dll", "C:\\julia\\bin\\libsuitesparseconfig.dll", "C:\\julia\\bin\\libcolamd.dll", "C:\\julia\\bin\\libamd.dll", "C:\\julia\\bin\\libsuitesparse_wrapper.DLL", "C:\\Users\\appveyor\\.julia\\artifacts\\3bc52a8ecc2836c9a93eb0a83425d2cb3871b08b\\bin\\libmetis.dll", "C:\\Users\\appveyor\\.julia\\artifacts\\87f297367bb2527a7dc3df599e3cb5ffd459a59f\\bin\\libopenblas64_.dll", "C:\\Users\\appveyor\\.julia\\artifacts\\bc235f981126068329fbf3f9f58f57ae63827269\\bin\\libklu.dll", "C:\\Users\\appveyor\\.julia\\artifacts\\bc235f981126068329fbf3f9f58f57ae63827269\\bin\\libbtf.dll", "C:\\Users\\appveyor\\.julia\\artifacts\\bc235f981126068329fbf3f9f58f57ae63827269\\bin\\libumfpack.dll", "C:\\Users\\appveyor\\.julia\\artifacts\\bc235f981126068329fbf3f9f58f57ae63827269\\bin\\libamd.dll", "C:\\Users\\appveyor\\.julia\\artifacts\\bc235f981126068329fbf3f9f58f57ae63827269\\bin\\libldl.dll", "C:\\Users\\appveyor\\.julia\\artifacts\\bc235f981126068329fbf3f9f58f57ae63827269\\bin\\libcolamd.dll", "C:\\Users\\appveyor\\.julia\\artifacts\\bc235f981126068329fbf3f9f58f57ae63827269\\bin\\libccolamd.dll", "C:\\Users\\appveyor\\.julia\\artifacts\\bc235f981126068329fbf3f9f58f57ae63827269\\bin\\libsuitesparseconfig.dll", "C:\\Users\\appveyor\\.julia\\artifacts\\bc235f981126068329fbf3f9f58f57ae63827269\\bin\\librbio.dll", "C:\\Users\\appveyor\\.julia\\artifacts\\bc235f981126068329fbf3f9f58f57ae63827269\\bin\\libcamd.dll", "C:\\Users\\appveyor\\.julia\\artifacts\\bc235f981126068329fbf3f9f58f57ae63827269\\bin\\libsuitesparse_wrapper.dll", "C:\\Users\\appveyor\\.julia\\artifacts\\bc235f981126068329fbf3f9f58f57ae63827269\\bin\\libcholmod.dll"]

All the needed libraries are already there :confused:

giordano commented 4 years ago

As far as I understand, the problem is that C:\julia\bin\libcholmod.DLL is normally shadowing C:\Users\appveyor\.julia\artifacts\bc235f981126068329fbf3f9f58f57ae63827269\bin\libcholmod.dll, unless we cd to C:\Users\appveyor\.julia\artifacts\bc235f981126068329fbf3f9f58f57ae63827269\bin to open the library, as pointed out on Slack by @KristofferC.

The latest libspqr.dll in Yggdrasil, built with support for Metis, expects the following symbols in libcholmod.dll:

% objdump -p libspqr-new.dll
        [...]
        DLL Name: libcholmod.dll
        vma:  Hint/Ord Member-Name Bound-To
        34938      65  cholmod_l_allocate_dense
        34954      67  cholmod_l_allocate_sparse
        34970      69  cholmod_l_allocate_work
        3498c      70  cholmod_l_amd
        3499c      74  cholmod_l_analyze_p2
        349b4      78  cholmod_l_calloc
        349c8      91  cholmod_l_colamd
        349dc     102  cholmod_l_dense_to_sparse
        349f8     107  cholmod_l_error
        34a0c     115  cholmod_l_free
        34a20     116  cholmod_l_free_dense
        34a38     117  cholmod_l_free_factor
        34a50     118  cholmod_l_free_sparse
        34a68     125  cholmod_l_malloc
        34a7c     127  cholmod_l_metis
        34a90     131  cholmod_l_nnz
        34aa0     136  cholmod_l_postorder
        34ab8     151  cholmod_l_realloc
        34acc     155  cholmod_l_reallocate_sparse
        34aec     177  cholmod_l_sparse_to_dense
        34b08     180  cholmod_l_speye
        34b1c     192  cholmod_l_transpose
        [...]

For comparison, libspqr.dll shipped with Julia and built without METIS support expects the following symbols:

% objdump -p libspqr-old.dll
        [...]
        DLL Name: libcholmod.dll
        vma:  Hint/Ord Member-Name Bound-To
        34928      63  cholmod_l_allocate_dense
        34944      65  cholmod_l_allocate_sparse
        34960      67  cholmod_l_allocate_work
        3497c      68  cholmod_l_amd
        3498c      72  cholmod_l_analyze_p2
        349a4      75  cholmod_l_calloc
        349b8      88  cholmod_l_colamd
        349cc      98  cholmod_l_dense_to_sparse
        349e8     103  cholmod_l_error
        349fc     111  cholmod_l_free
        34a10     112  cholmod_l_free_dense
        34a28     113  cholmod_l_free_factor
        34a40     114  cholmod_l_free_sparse
        34a58     121  cholmod_l_malloc
        34a6c     124  cholmod_l_nnz
        34a7c     129  cholmod_l_postorder
        34a94     144  cholmod_l_realloc
        34aa8     148  cholmod_l_reallocate_sparse
        34ac8     170  cholmod_l_sparse_to_dense
        34ae4     173  cholmod_l_speye
        34af8     185  cholmod_l_transpose
        [...]

We can check if the symbols expected by the latest libspqr.dll are present in libcholmod.dll shipped with Julia (with the METIS-related symbol as a potential culprit):

% for symbol in $(objdump -p libspqr-new.dll|grep cholmod_l|awk '{print $3}'); do nm libcholmod-old.dll|grep -w "${symbol}" || echo "---> ${symbol} not found"; done
000000006ba71130 T cholmod_l_allocate_dense
000000006ba76240 T cholmod_l_allocate_sparse
000000006ba6e8f0 T cholmod_l_allocate_work
000000006ba8b570 T cholmod_l_amd
000000006ba8cd90 T cholmod_l_analyze_p2
000000006ba75750 T cholmod_l_calloc
000000006ba8dc10 T cholmod_l_colamd
000000006ba72cc0 T cholmod_l_dense_to_sparse
000000006ba73840 T cholmod_l_error
000000006ba756e0 T cholmod_l_free
000000006ba71010 T cholmod_l_free_dense
000000006ba73a90 T cholmod_l_free_factor
000000006ba760d0 T cholmod_l_free_sparse
000000006ba755a0 T cholmod_l_malloc
---> cholmod_l_metis not found
000000006ba773b0 T cholmod_l_nnz
000000006ba8f460 T cholmod_l_postorder
000000006ba75890 T cholmod_l_realloc
000000006ba76540 T cholmod_l_reallocate_sparse
000000006ba722f0 T cholmod_l_sparse_to_dense
000000006ba766a0 T cholmod_l_speye
000000006ba7a5a0 T cholmod_l_transpose

So libcholmod.dll shipped with Julia doesn't provide cholmod_l_metis as expected, which also explains the error message "The specified procedure could not be found."

ViralBShah commented 4 years ago

The issue is that in Julia base, we ship a SuiteSparse which builds Cholmod without metis support, in order to avoid an extra dependency. The solution proposed was to build SuiteSparse_jll with metis support and we could use that in the package ecosystem.

Clearly, these two are clashing. It would be much simpler to have one SuiteSparse - and just ship metis in base Julia until a point where we can move SuiteSparse out altogether.

staticfloat commented 4 years ago

My branch for https://github.com/JuliaLang/julia/issues/33973 will solve this particular issue.

giordano commented 4 years ago

For the record, a couple of days ago it was reported a case where order of dlopening seems to be important: https://discourse.julialang.org/t/http-get-crashes-julia-completely/43506/17.