Evovest / EvoTrees.jl

Boosted trees in Julia
https://evovest.github.io/EvoTrees.jl/dev/
Apache License 2.0
174 stars 20 forks source link

Create CUDA extension #259

Closed devmotion closed 8 months ago

devmotion commented 9 months ago

I moved the CUDA-specific code to an extension. On Julia < 1.9, CUDA will continue to be a dependency and CUDA is supported automatically; on Julia >= 1.9, the package will not install CUDA by default but CUDA support can be enabled by loading CUDA (explicitly with e.g. using CUDA or when loaded by another dependency).

It was reasonably straightforward:

The main problem is that I don't have access to an NVIDIA GPU, and hence can't verify that I did not break CUDA support 😅 I would recommend adding CUDA-specific tests with buildkite to the repo (see https://github.com/JuliaGPU/buildkite#adding-a-repository, the setup is quite straightforward, I did add it to a few repos successfully).

Fixes #226.

codecov[bot] commented 9 months ago

Codecov Report

Attention: 76 lines in your changes are missing coverage. Please review.

Comparison is base (cf6e3c0) 50.80% compared to head (6d33c71) 51.08%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #259 +/- ## ========================================== + Coverage 50.80% 51.08% +0.27% ========================================== Files 21 22 +1 Lines 1986 1985 -1 ========================================== + Hits 1009 1014 +5 + Misses 977 971 -6 ``` | [Files](https://app.codecov.io/gh/Evovest/EvoTrees.jl/pull/259?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None) | Coverage Δ | | |---|---|---| | [src/EvoTrees.jl](https://app.codecov.io/gh/Evovest/EvoTrees.jl/pull/259?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-c3JjL0V2b1RyZWVzLmps) | `16.66% <ø> (ø)` | | | [src/callback.jl](https://app.codecov.io/gh/Evovest/EvoTrees.jl/pull/259?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-c3JjL2NhbGxiYWNrLmps) | `70.12% <100.00%> (+1.77%)` | :arrow_up: | | [src/fit.jl](https://app.codecov.io/gh/Evovest/EvoTrees.jl/pull/259?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-c3JjL2ZpdC5qbA==) | `96.36% <100.00%> (+1.74%)` | :arrow_up: | | [src/init.jl](https://app.codecov.io/gh/Evovest/EvoTrees.jl/pull/259?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-c3JjL2luaXQuamw=) | `94.64% <100.00%> (+3.41%)` | :arrow_up: | | [ext/EvoTreesCUDAExt/subsample.jl](https://app.codecov.io/gh/Evovest/EvoTrees.jl/pull/259?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-ZXh0L0V2b1RyZWVzQ1VEQUV4dC9zdWJzYW1wbGUuamw=) | `0.00% <0.00%> (ø)` | | | [ext/EvoTreesCUDAExt/fit-utils.jl](https://app.codecov.io/gh/Evovest/EvoTrees.jl/pull/259?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-ZXh0L0V2b1RyZWVzQ1VEQUV4dC9maXQtdXRpbHMuamw=) | `0.00% <0.00%> (ø)` | | | [ext/EvoTreesCUDAExt/EvoTreesCUDAExt.jl](https://app.codecov.io/gh/Evovest/EvoTrees.jl/pull/259?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-ZXh0L0V2b1RyZWVzQ1VEQUV4dC9Fdm9UcmVlc0NVREFFeHQuamw=) | `33.33% <33.33%> (ø)` | | | [ext/EvoTreesCUDAExt/loss.jl](https://app.codecov.io/gh/Evovest/EvoTrees.jl/pull/259?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-ZXh0L0V2b1RyZWVzQ1VEQUV4dC9sb3NzLmps) | `0.00% <0.00%> (ø)` | | | [ext/EvoTreesCUDAExt/eval.jl](https://app.codecov.io/gh/Evovest/EvoTrees.jl/pull/259?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-ZXh0L0V2b1RyZWVzQ1VEQUV4dC9ldmFsLmps) | `0.00% <0.00%> (ø)` | | | [ext/EvoTreesCUDAExt/predict.jl](https://app.codecov.io/gh/Evovest/EvoTrees.jl/pull/259?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None#diff-ZXh0L0V2b1RyZWVzQ1VEQUV4dC9wcmVkaWN0Lmps) | `0.00% <0.00%> (ø)` | | | ... and [2 more](https://app.codecov.io/gh/Evovest/EvoTrees.jl/pull/259?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=None) | |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

jeremiedb commented 9 months ago

Thanks for the PR!

  1. I'm not clear whether CUDA should be kept in deps. It went fine on Julia 1.8, but I encountered an issue on 1.10 which I hadn't time to investigate
  2. Regarding: https://github.com/TuringLang/MCMCDiagnosticTools.jl/pull/93, I'm not clear why the CI test did fail on nightly. My understanding is that CUDA should load fine regardless if the machine isn't NVIDIA compatible. Has there been a change to that in Julia 1.10?
  3. Is the fit from EvoTrees poorer than for other classifiers in rtstar test? The need to pass 1_000 rather than the default 100 appears suspicious. I wouldn't expect a significant discrepancy compared to XGBoost (with its default to max_round=10 and learning_rate=0.3, I'd actually expects its fit to be inferior). If you'd have a data to share, I'd be curious to see if there might be an issue in the classifier routine.
devmotion commented 9 months ago
  1. I'm not clear whether CUDA should be kept in deps. It went fine on Julia 1.8, but I encountered an issue on 1.10 which I hadn't time to investigate

Yes, CUDA should be kept in the deps section for backwards compatibility with Julia < 1.9. https://pkgdocs.julialang.org/v1/creating-packages/#Transition-from-normal-dependency-to-extension state explicitly

Make sure that the package is both in the [deps] and [weakdeps] section. Newer Julia versions will ignore dependencies in [deps] that are also in [weakdeps].

Problems with 1.10 should be caused by something else.

devmotion commented 9 months ago

2. I'm not clear why the CI test did fail on nightly. My understanding is that CUDA should load fine regardless if the machine isn't NVIDIA compatible. Has there been a change to that in Julia 1.10?

No, it should work fine (the main problem with CUDA is that it pulls in many dependencies and in particular CUDA jlls). The problem with the current nightlies (https://github.com/TuringLang/MCMCDiagnosticTools.jl/actions/runs/6428712477/job/17456380934#step:6:335) is that CUDA defines GPU overrides for Random.make_seed (https://github.com/JuliaGPU/CUDA.jl/blob/da140359fc68f453e511608d9af552c90d699d9a/src/device/random.jl#L53) but Random.make_seed was removed last week (https://github.com/JuliaLang/julia/pull/51436).

jeremiedb commented 8 months ago

Thanks for the clarifications. Regarding the Project.toml keys order, I'll likely move towards the Julia v1.6 order in subsequent release. Let me know if that's an issue for you. Or feel free to make an adjustment in this PR. Otherwise, good to go!

devmotion commented 8 months ago

I updated the order of the sections in the Project.toml file to the convention in Julia > 1.6.

jeremiedb commented 8 months ago

Registraion of new release v0.16.3 is underway. Thanks for your patience!

devmotion commented 8 months ago

Thank you!