IQVIA-ML / LightGBM.jl

Julia FFI interface to Microsoft's LightGBM package
Other
93 stars 10 forks source link

cannot load on macos #138

Closed adienes closed 4 months ago

adienes commented 1 year ago
julia> using LightGBM
[ Info: lib_lightgbm not found in system dirs, trying fallback
ERROR: InitError: LightGBM.LibraryNotFoundError("lib_lightgbm not found. Please ensure this library is either in system dirs or the dedicated paths: [\"/Users/andy/.julia/packages/LightGBM/mUQGf/src\"]")

julia> versioninfo()
Julia Version 1.9.1
Commit 147bdf428cd (2023-06-07 08:27 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin22.4.0)
  CPU: 10 × Apple M1 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
  Threads: 1 on 8 virtual cores

(@v1.9) pkg> st LightGBM
Status `~/.julia/environments/v1.9/Project.toml`
  [7acf609c] LightGBM v0.6.1

➜  ~ which lightgbm 
/opt/homebrew/bin/lightgbm
FatemehTahavori commented 1 year ago

Hi can you try this and let us know if it fixes your issue: https://github.com/IQVIA-ML/LightGBM.jl/issues/122#issuecomment-1217025219

adienes commented 1 year ago

N.B. the bundled binary is Intel.

my Julia is ARM, won't that be the problem? adding dyld path didn't help

rikhuijzer commented 1 year ago

I have the same on a regular Intel chip. It started last week and persisted after completely wiping my Julia installation and re-installing all packages.

[ Info: lib_lightgbm not found in system dirs, trying fallback
ERROR: LoadError: InitError: LightGBM.LibraryNotFoundError("lib_lightgbm not found. Please ensure this library is either in system dirs or the dedicated paths: [\"/home/rik/.julia/packages/LightGBM/mUQGf/src\"]")
Stacktrace:
  [1] find_library(library_name::String, custom_paths::Vector{String})
    @ LightGBM ~/.julia/packages/LightGBM/mUQGf/src/LightGBM.jl:33
  [2] __init__()
    @ LightGBM ~/.julia/packages/LightGBM/mUQGf/src/LightGBM.jl:42
  [3] register_restored_modules(sv::Core.SimpleVector, pkg::Base.PkgId, path::String)
    @ Base ./loading.jl:1115
  [4] _include_from_serialized(pkg::Base.PkgId, path::String, ocachepath::String, depmods::Vector{Any})
    @ Base ./loading.jl:1061
  [5] _require_search_from_serialized(pkg::Base.PkgId, sourcepath::String, build_id::UInt128)
    @ Base ./loading.jl:1506
  [6] _require(pkg::Base.PkgId, env::String)
    @ Base ./loading.jl:1783
  [7] _require_prelocked(uuidkey::Base.PkgId, env::String)
    @ Base ./loading.jl:1660
  [8] macro expansion
    @ ./loading.jl:1648 [inlined]
  [9] macro expansion
    @ ./lock.jl:267 [inlined]
 [10] require(into::Module, mod::Symbol)
    @ Base ./loading.jl:1611
 [11] include(fname::String)
    @ Base.MainInclude ./client.jl:478
 [12] top-level scope
    @ ~/git/SIRUS.jl/test/runtests.jl:1
 [13] include
    @ ./client.jl:478 [inlined]
 [14] rt()
    @ Main ~/.julia/config/startup.jl:15
 [15] top-level scope
    @ REPL[2]:1
during initialization of module LightGBM
in expression starting at [...]
in expression starting at [...]

julia> versioninfo()
Julia Version 1.9.1
Commit 147bdf428cd (2023-06-07 08:27 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 12 × 12th Gen Intel(R) Core(TM) i7-1255U
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, alderlake)
  Threads: 7 on 12 virtual cores
Environment:
  JULIA_NUM_THREADS = 7
rikhuijzer commented 1 year ago

Fixed it:

julia> ]

pkg> build
    Building HDF5 ────→ `~/.julia/scratchspaces/44cfe95a-1eb2-52ea-b672-e2afdf69b78f/c73fdc3d9da7700691848b78c61841274076932a/build.log`
    Building LightGBM → `~/.julia/scratchspaces/44cfe95a-1eb2-52ea-b672-e2afdf69b78f/ce5f0bbb93610549e94dc1b1d6a1e238ae021d7d/build.log`

and it worked.

Maybe 1.9 is not automatically triggering build when adding packages?

rikhuijzer commented 1 year ago

Oh sorry, it didn't fix it. Now I have:

mlj: Error During Test at [...]
  Got exception outside of a @test
  LoadError: TaskFailedException

      nested task error: ArgumentError: NULL library handle
      Stacktrace:
        [1] #dlsym#1
          @ ./libdl.jl:57 [inlined]
        [2] dlsym
          @ ./libdl.jl:56 [inlined]
        [3] LGBM_DatasetCreateFromMat(data::Matrix{Float64}, parameters::String, is_row_major::Bool)
          @ LightGBM ~/.julia/packages/LightGBM/mUQGf/src/wrapper.jl:121
        [4] dataset_constructor
          @ ~/.julia/packages/LightGBM/mUQGf/src/fit.jl:99 [inlined]

So that's a duplicate of https://github.com/IQVIA-ML/LightGBM.jl/issues/122.

kainkad commented 1 year ago

Oh sorry, it didn't fix it. Now I have:

mlj: Error During Test at [...]
  Got exception outside of a @test
  LoadError: TaskFailedException

      nested task error: ArgumentError: NULL library handle
      Stacktrace:
        [1] #dlsym#1
          @ ./libdl.jl:57 [inlined]
        [2] dlsym
          @ ./libdl.jl:56 [inlined]
        [3] LGBM_DatasetCreateFromMat(data::Matrix{Float64}, parameters::String, is_row_major::Bool)
          @ LightGBM ~/.julia/packages/LightGBM/mUQGf/src/wrapper.jl:121
        [4] dataset_constructor
          @ ~/.julia/packages/LightGBM/mUQGf/src/fit.jl:99 [inlined]

So that's a duplicate of #122.

Thanks for your comments on this @rikhuijzer . This particular test error is separate to #122 because if you were able to run tests then it means after building from pre-compiled binaries, the lib_lightgbm was found in the system directories and tests were triggered. Looking at the stack trace you provided it's on the mlj tests so we'd need to look into this separately. I can see you have julia 1.9 and our CI is run up until 1.8 where these tests don't fail so I suppose we might look at extending CI to include 1.9 and see if these tests need to be updated to support julia 1.9. @adienes Can you find where is the lib_lightgbm in your system? Because you might need to use the pre-compiled binaries to build lightgbm first if you haven't done so. If you don't have it at all can you go to package manager and ] build lightgbm. I assume you have already also did brew install libomp for your ARM-based Julia as libomp is the only dependency for lightgbm that needs to be installed separately.

kainkad commented 11 months ago

@adienes Can you confirm you that this is no longer an issue? Based on your previous comments you're on M1 arm-64 and your julia is also downloaded for arm architecture so you should be able to just brew install libomp and brew install lightgbm. If this has not been working for you, could you try checking this PR and see if using LightGBM_jll from Binary Builder sorts this out for you and let us know? It won't probably sort the linking issues across architectures as raised on the #122 but in your case it seems the builds are consistent for the same arm-64.

rikhuijzer commented 11 months ago

@adienes Can you confirm you that this is no longer an issue? Based on your previous comments you're on M1 arm-64 and your julia is also downloaded for arm architecture so you should be able to just brew install libomp and brew install lightgbm. If this has not been working for you, could you try checking this PR and see if using LightGBM_jll from Binary Builder sorts this out for you and let us know? It won't probably sort the linking issues across architectures as raised on the #122 but in your case it seems the builds are consistent for the same arm-64.

I can confirm that this is still an issue, which would probably best be solved by https://github.com/IQVIA-ML/LightGBM.jl/pull/134.

kainkad commented 11 months ago

@adienes Can you confirm you that this is no longer an issue? Based on your previous comments you're on M1 arm-64 and your julia is also downloaded for arm architecture so you should be able to just brew install libomp and brew install lightgbm. If this has not been working for you, could you try checking this PR and see if using LightGBM_jll from Binary Builder sorts this out for you and let us know? It won't probably sort the linking issues across architectures as raised on the #122 but in your case it seems the builds are consistent for the same arm-64.

I can confirm that this is still an issue, which would probably best be solved by #134.

Thank you for letting me know @rikhuijzer. I've realised your example was with Julia 1.9.1 on Linux build not mac. I pulled the latest released LightGBM v0.6.1 and got the same Julia version as yours just in case there was something odd with this setup:

julia> versioninfo()
Julia Version 1.9.1
Commit 147bdf428cd (2023-06-07 08:27 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)

exported the LIGHTGBM_EXAMPLES_PATH, removed the Manifest.toml, ] build, ] test it and it does work without any problems. I also added the 1.9 CI support for all platforms to this repo so it's actually quite surprising to have issues on linux as the cases raised so far were M1 arm based macs where there was a cross-architecture library conflict. There are some additional considerations regarding the binary builder which are not making it a straightforward switch. I tested LightGBM_jll (inclusive on Windows on actual machines not just via docker images) and while it works fine on Linux, it unfortunately doesn't work with Windows with Julia 1.8 and the most recent 1.9 whereas the current setup does work on all (including Intel and arm-based macs with some tweaks to make sure the library linking is correct) so it's not an ideal scenario either given that 1.9.3 is the latest stable release. Probably the best solution would be to make binary builder optional, or fallback to but that would require some sort of conditional dependency loading which unless I'm mistaken doesn't seem to be well supported in julia.

rikhuijzer commented 11 months ago

Ah thank you for following up @kainkad. It seems there was some confusion. I thought this issue was about the M1 and not about x86 Intel. I said I can confirm that it's still broken on Intel, but that's might not be true. I only ran into this exact issue on yet another ARM (not Apple ARM, just regular ARM) this morning, but I don't know about the Intel since I sold that laptop recently.

It won't probably sort the linking issues across architectures as raised on the https://github.com/IQVIA-ML/LightGBM.jl/issues/122 but in your case it seems the builds are consistent for the same arm-64.

Uhm maybe here is some confusion, but the nice thing about BinaryBuilder is that it does handle multiple architectures. In LightGBM, you just specify which binary you want and BinaryBuilder manages the different binaries for different architectures. They have a whole infrastructure for managing multiple binaries. For LightGBM, they support aarch64 (Apple ARM x64), armv6l, armv7l, i686, powerpc64le, and last but not least x86_64, see https://github.com/JuliaBinaryWrappers/LightGBM_jll.jl/blob/main/Artifacts.toml. That Artifacts.toml, by the way, was generated by Yggdrasil, see https://github.com/JuliaPackaging/Yggdrasil/pull/6714/.

kainkad commented 11 months ago

Thank you @rikhuijzer for the follow-up. Yes, it does look like on Intel processors there is no issue. The pre-compiled Microsoft lightgbm binaries are Intel-based and if run on Intel Julia there doesn't seem to be any issue. I have created a PR to allow macOS users to get LightGBM_jll while leaving the option for using custom binaries or the provided precompiled binaries for other operating systems, given that the majority of issues reported issues occurred for macOS/M1/aarch64-apple-darwin (with an exception to the non-macOS arm case you raised). Because LightGBM_jll is supporting Julia versions from 1.6 onwards, this means we need to drop off compatibility we had for Julia versions from 1.0 (which I agree not many would be still using) by bringing LightGBM_jll in as a dependency. Additionally, it does look like macOS github runners use Intel processors and the issue to get the macOS Arm64 runners is still opened so we won’t be really able to add a runner to test LightGBM_jll working correctly in case anything went wrong. Please have a look at this PR. Since it’s a small change to https://github.com/IQVIA-ML/LightGBM.jl/pull/134, @characat0 , let me know if we can merge these two, or you’d want to update yours so it still allows to use custom binaries or the precompiled ones that are provided.

characat0 commented 11 months ago

Since it’s a small change to #134, @characat0 , let me know if we can merge these two, or you’d want to update yours so it still allows to use custom binaries or the precompiled ones that are provided.

Sure, feel free to merge these changes with mine

ablaom commented 5 months ago

@kainkad Any progress here?

We're working on updating this LGBM tutorial and hitting this issue.

kainkad commented 4 months ago

Thanks @ablaom for following up on this and letting me know you're still facing this issue. It's been quite tricky to resolve it in an "automated" way without having the users to do some local build by themselves depending on their hardware architectures and julia versions (as mentioned here: https://github.com/IQVIA-ML/LightGBM.jl/issues/122#issuecomment-1217025219) There was an amazing solution provided by @characat0 to use BinaryBuilder.jl instead of the lightgbm's binaries this julia package ported from Microsoft's lightgbm but at the time there were some errors with building binaries on other architectures (e.g. Windows) and across various julia's versions (e.g. 1.8). As it stands at the moment the BinaryBuilder has restricted julia compatibility to only support version 1.7 as mentioned here which I believe is quite restrictive even for the MLJ packages (correct me if I'm wrong) as this would run into compatibility issues if we decided to go ahead with this package instead of the currently provided binaries. The solution to this has a separate open issue https://github.com/JuliaPackaging/JLLPrefixes.jl/issues/6 which requires some contributions outside of LightGBM.jl package. I have followed the discussions on Microsoft as well as this seems to be the same issue for python equivalent as they don't provide binaries for ARM architectures so there are also workarounds in place. There is a chance they can provide ARM binaries in the future lightgbm v4 releases but this hasn't been done yet. Does following on this: https://github.com/IQVIA-ML/LightGBM.jl/issues/122#issuecomment-1217025219 does not resolve the issue you're facing? This needs to be done only once the first time lightgbm is built.

characat0 commented 4 months ago

I think the BinaryBuilder compatibility restriction only applies to build systems used to generate JLL packages, not the usage of the JLL itself and it shouldn't be a problem here. At the moment I made the PR I was in a company working a lot with LightGBM models, so I used my changes there and didn't bother of working on backwards or forwards compatibility, but I remember struggling to make this work in such a way that cover all Julia >= 1.0 I will check if the situation has changed now as I see this is still an issue

characat0 commented 4 months ago

After spending some time reviewing the many changes of lightgbm 4.0.0, I think we can ship a new version of this library that drops support for Julia < 1.6 but includes newer versions of lightgbm via LightGBM_jll, does that sound like a good idea?

ablaom commented 4 months ago

As far as MLJ support is concerned, Julia 1.6 >= 0 is sufficient.

kainkad commented 4 months ago

After spending some time reviewing the many changes of lightgbm 4.0.0, I think we can ship a new version of this library that drops support for Julia < 1.6 but includes newer versions of lightgbm via LightGBM_jll, does that sound like a good idea?

Your PR with the lightgbm v3 and the BinaryBuilder for julia >=1.6 seems to be running fine (except for crashing on Windows with julia 1.8 but we could possibly live with it if everything else seems to be working). I'm looking at extending the CI matrix to also run with the macos-14 which should be running the arm architectures on macos so hopefully it should work too. Can you confirm your PR works on arm mac os (assuming we drop support for any <1.6 julia versions)? In regards to shipping lightgbm v4 would you still be able to support v3, or not? There have been some breaking changes in the v4 so this LightGBM.jl would require changes to support v4 and as it was done with the v2 to v3 move there would be only one supported version.

characat0 commented 4 months ago

Can you confirm your PR works on arm mac os (assuming we drop support for any <1.6 julia versions)?

Yes, I ran the tests locally (including MLJ ones) and they work fine.

In regards to shipping lightgbm v4 would you still be able to support v3, or not?

There is now a v4 release for LightGBM_jll, and switching when we are ready should be as easy as changing the compat section to something like:

[compat]
LightGBM_jll = "4"
kainkad commented 4 months ago

Thanks to Everyone for their contributions particularly to @characat0 for the PR with adding _jll. The v0.7.0 of LightGBM addressing the issue raised here and in #122 has been released.

kainkad commented 4 months ago

Closing as completed in v0.7.0