IQVIA-ML / LightGBM.jl

Julia FFI interface to Microsoft's LightGBM package
Other
93 stars 10 forks source link

Fix lgbm lib not found #125

Closed FatemehTahavori closed 1 year ago

yaxxie commented 2 years ago

It would be good if you can explain how the fix works, or what was going wrong before that this changes.

yaxxie commented 2 years ago

@FatemehTahavori this PR does not fix anything, I got a M1 mac to test with

Let me first start by stating what the issue is. The linker searches for libraries to load, but there is a constraint: all libraries loaded need to be for the same architecture as the running host program.

Screenshot 2022-08-11 at 11 24 28

You'll notice here that that it says this julia binary is intel architecture. That means the following:

Next, I note that brew installs packages in arm mode. Finally, I note that upcoming julia 1.8 is with an (experimental) native arm compiled binary.

So now I can run a hypothesis. If what I stated above is true, then when I have arm libomp and arm lightgbm, it will fail to load with intel julia, and succeed to load with arm julia.

claire.watson@GBY0JFYDQ6Q6 local % DYLD_LIBRARY_PATH=/opt/homebrew/lib/ /Applications/Julia-1.8\ 2.app/Contents/Resources/julia/bin/julia -e "import Pkg; Pkg.status();import LightGBM" 
Status `~/.julia/environments/v1.8/Project.toml`
  [7acf609c] LightGBM v0.5.2
[ Info: lib_lightgbm found in system dirs!
claire.watson@GBY0JFYDQ6Q6 local % 
claire.watson@GBY0JFYDQ6Q6 local % DYLD_LIBRARY_PATH=/opt/homebrew/lib/ /Applications/Julia-1.8.app/Contents/Resources/julia/bin/julia -e "import Pkg; Pkg.status();import LightGBM" 
Status `~/.julia/environments/v1.8/Project.toml`
  [7acf609c] LightGBM v0.5.2
[ Info: lib_lightgbm not found in system dirs, trying fallback
ERROR: InitError: LightGBM.LibraryNotFoundError("lib_lightgbm not found. Please ensure this library is either in system dirs or the dedicated paths: [\"/Users/claire.watson/.julia/packages/LightGBM/A7zVd/src\"]")
Stacktrace:
 [1] find_library(library_name::String, custom_paths::Vector{String})
   @ LightGBM ~/.julia/packages/LightGBM/A7zVd/src/LightGBM.jl:32
 [2] __init__()
   @ LightGBM ~/.julia/packages/LightGBM/A7zVd/src/LightGBM.jl:41
 [3] _include_from_serialized(pkg::Base.PkgId, path::String, depmods::Vector{Any})
   @ Base ./loading.jl:831
 [4] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String, build_id::UInt64)
   @ Base ./loading.jl:1039
 [5] _require(pkg::Base.PkgId)
   @ Base ./loading.jl:1315
 [6] _require_prelocked(uuidkey::Base.PkgId)
   @ Base ./loading.jl:1200
 [7] macro expansion
   @ ./loading.jl:1180 [inlined]
 [8] macro expansion
   @ ./lock.jl:223 [inlined]
 [9] require(into::Module, mod::Symbol)
   @ Base ./loading.jl:1144
during initialization of module LightGBM
claire.watson@GBY0JFYDQ6Q6 local % 

The julia install at the 2 path (the first command) is the ARM one, and the julia install without the 2 in the path (the second command) is the Intel install. I also printed the LightGBM package information to show that this in fact works correctly with a released version of LightGBM.jl (i.e. without this change).

Now, to explain how I made this work

In light of this I have to recommend this PR be closed, since it doesn't actually do anything or fix the issue (this PR can't possibly fix cross-architecture linking nor is it the place of this specific project top fix that problem)

yaxxie commented 2 years ago

@danielsoutar please see the above. I recommend the closure of this PR

Also @FatemehTahavori in light of the above it seems the advice given in #122 is incorrect; the correct advise should be to install julia 1.8 with ARM build on mac M1 and to either compile own lightgbm binary or use the brew binary and set DYLD_LIBRARY_PATH to the correct locations.

FatemehTahavori commented 2 years ago

@yaxxie we could reproduce the issue in docker:

Julia development

docker run -d -it --platform=linux/x86_64 --name lgbm -v pwd:/home/julia/ julia:1.6.3

Connect to Container using VSCode

Inside Container

cd ~ wget -O /tmp/lgbm.tar https://github.com/microsoft/LightGBM/archive/v3.2.0.tar.gz tar -xf /tmp/lgbm.tar -C /tmp/ export LIGHTGBM_EXAMPLES_PATH=/tmp/LightGBM-3.2.0

To add the package to Julia:

import Pkg Pkg.add("LightGBM") If you run tests they are failing Pkg.test("LightGBM")

Add this branch to Julia:

(@v1.6) pkg> add LightGBM#Fix_lgbm_libNotFound tests are passing

yaxxie commented 2 years ago
[ Info: ["libcrypt"] not found in `DL_LOAD_PATH`, or system library paths, trying fallback
find_library finds system lib: Error During Test at /root/.julia/packages/LightGBM/fXI7r/test/basic/test_lightgbm.jl:77
  Got exception outside of a @test
  LightGBM.LibraryNotFoundError("[\"libcrypt\"] not found. Please check this library using Libdl.dlopen(l; throw_error=true) where l = joinpath(custom_paths, lib)")
  Stacktrace:
    [1] find_library(library_names::Vector{String}, custom_paths::Vector{String})
      @ LightGBM ~/.julia/packages/LightGBM/fXI7r/src/LightGBM.jl:36
    [2] macro expansion
      @ ~/.julia/packages/LightGBM/fXI7r/test/basic/test_lightgbm.jl:84 [inlined]
    [3] macro expansion
      @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Test/src/Test.jl:1151 [inlined]
    [4] macro expansion
      @ ~/.julia/packages/LightGBM/fXI7r/test/basic/test_lightgbm.jl:80 [inlined]
    [5] macro expansion
      @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Test/src/Test.jl:1151 [inlined]
    [6] top-level scope
      @ ~/.julia/packages/LightGBM/fXI7r/test/basic/test_lightgbm.jl:44
    [7] include(fname::String)
      @ Base.MainInclude ./client.jl:444
    [8] macro expansion
      @ ~/.julia/packages/LightGBM/fXI7r/test/runtests.jl:84 [inlined]
    [9] macro expansion
      @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Test/src/Test.jl:1151 [inlined]
   [10] macro expansion
      @ ~/.julia/packages/LightGBM/fXI7r/test/runtests.jl:84 [inlined]
   [11] macro expansion
      @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Test/src/Test.jl:1151 [inlined]
   [12] top-level scope
      @ ~/.julia/packages/LightGBM/fXI7r/test/runtests.jl:59
   [13] include(fname::String)
      @ Base.MainInclude ./client.jl:444
   [14] top-level scope
      @ none:6
   [15] eval
      @ ./boot.jl:360 [inlined]
   [16] exec_options(opts::Base.JLOptions)
      @ Base ./client.jl:261
   [17] _start()
      @ Base ./client.jl:485
[ Info: ["lib_that_simply_doesnt_exist"] not found in `DL_LOAD_PATH`, or system library paths, trying fallback
Test Summary:                                   | Pass  Error  Broken  Total
Basic tests                                     |  111      1       2    114
  Estimator parameters                          |   20              2     22
  Estimator parameters                          |   15                    15
  Utils                                         |    5                     5
  Fit                                           |   52                    52
  CV                                            |    6                     6
  Search CV                                     |   10                    10
  LightGBM                                      |    3      1              4
    find_library                                |    3      1              4
      find_library works with no system lib     |    1                     1
      find_library finds system lib first       |    1                     1
      find_library finds system lib             |           1              1
      find_library returns empty and logs error |    1                     1
ERROR: LoadError: Some tests did not pass: 111 passed, 0 failed, 1 errored, 2 broken.
in expression starting at /root/.julia/packages/LightGBM/fXI7r/test/runtests.jl:57
ERROR: Package LightGBM errored during testing

(@v1.6) pkg> st
      Status `~/.julia/environments/v1.6/Project.toml`
  [7acf609c] LightGBM v0.5.2 `https://github.com/IQVIA-ML/LightGBM.jl.git#Fix_lgbm_libNotFound`

(@v1.6) pkg> 

still fails that particular test when I check it. Furthermore, I suspect there is something not quite right about that docker image because:

julia> import Libdl

julia> Libdl.find_library("libcrypt")
""

julia> Libdl.find_library("libpcprofile")
"libpcprofile"

julia> 

and when I check the system linker paths (in that docker image) I find this:

root@5e2eaf1c5f7f:~# cat /etc/ld.so.conf.d/* | grep -v '#' | xargs find | grep libcrypt
find: '/usr/local/lib/x86_64-linux-gnu': No such file or directory
/lib/x86_64-linux-gnu/libcrypt-2.28.so
/lib/x86_64-linux-gnu/libcrypt.so.1
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
root@5e2eaf1c5f7f:~# cat /etc/ld.so.conf.d/* | grep -v '#' | xargs find | grep libpcprofile
find: '/usr/local/lib/x86_64-linux-gnu': No such file or directory
/lib/x86_64-linux-gnu/libpcprofile.so

You can see that for the libpcprofile that was found, there was a file ending with .so only -- for libcrypt which was not found, there was no lib with .so only at the end, .so.1 etc. This is normal, as libs can be versioned. But systems usually ship with short form symlinks to the longer names:

[yaxattax@fedora ~]$ find / 2>/dev/null| grep libcrypt.so$ 
/home/yaxattax/.local/share/containers/storage/overlay/4723f6643c4df3f617b017f7063d23aa50c7562bed4dc3074578ce896a385972/diff/usr/lib/x86_64-linux-gnu/libcrypt.so
/home/yaxattax/.local/share/containers/storage/overlay/1d5b529db9abb046582d40bfac011d80fc021907f3bfc31dda6870a52ee5e3da/diff/usr/lib/x86_64-linux-gnu/libcrypt.so
/usr/lib64/libcrypt.so
[yaxattax@fedora ~]$ ls -l /usr/lib64/libcrypt.so
lrwxrwxrwx. 1 root root 17 Feb  1  2022 /usr/lib64/libcrypt.so -> libcrypt.so.2.0.0
[yaxattax@fedora ~]$ 

and then if you launch julia on this machine and try to load libcrypt:

julia> import Libdl

julia> Libdl.find_library("libcrypt")
"libcrypt"

julia> 

You can see it works.

As a bonus: inside the docker image, I draw your attention to this:

julia> Libdl.find_library("libcrypt-2.28")
"libcrypt-2.28"

julia> 

and remind ourselves what we found when looking for libcrypt:

root@5e2eaf1c5f7f:~# cat /etc/ld.so.conf.d/* | grep -v '#' | xargs find | grep libcrypt
find: '/usr/local/lib/x86_64-linux-gnu': No such file or directory
/lib/x86_64-linux-gnu/libcrypt-2.28.so
/lib/x86_64-linux-gnu/libcrypt.so.1
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
root@5e2eaf1c5f7f:~#

notice this one: /lib/x86_64-linux-gnu/libcrypt-2.28.so with a .so at the end, and we found it if we asked for libcrypt-2.28

So once again, I put it to you that this PR fixes nothing.

yaxxie commented 2 years ago

Since

I think that this PR needs to be closed. Please close it @FatemehTahavori

yaxxie commented 1 year ago

@FatemehTahavori would you mind closing this PR? Unless I am mistaken, I don't believe we require it.