Closed karanbudhraja-tgen closed 11 months ago
Can you elaborate on your use case? Are you running multiple Julia processes on one machine, and want the XGBoost training to not use all available cores? What is the downside to using nthreads
? Not sure I understand exactly what you mean, sorry.
There isn't a downside to using nthreads
, it is just that it is a mandate. When I set Julia's number of processes using -p
, other libraries such as Distributed only use that many processes. However, XGBoost does not consider that value and uses all available cores on the machine (more than the number of processes I specified using -p
). What I meant to suggest is that XGBoost limit itself also when we use -p
when running Julia. This can probably be done by internally passing the supplied value for -p
if any as nthreads
to XGBoost.
I hope that helped clarify some more, but please let me know and I can try to explain further.
You may also want to also consider the pure julia tree-boosting algorithm at EvoTrees.jl. Moving forward, it may be easier to get them to add missing features than to expose what you want from this wrapped version. Development is still fairly active there.
Just a suggestion.
Thank you for the suggestion! I had not considered EvoTrees.jl
. It is great that they are active. I will check that implementation as well.
Hello, I'm on the other side. I think the XGBoost only takes 1 core on my macOS (10.15, core i5), the CPU usage is ~90% vs at the same time, python XGBoost package could exploit ~360%.
Passing nthread=4
or not doesn't change the situation. Parameters like this
(
nrounds=1000,
nthread=4,
# verbosity=3,
# slient=false,
tree_method="hist",
eta=0.1,
subsample=0.8,
max_bin=256,
max_depth=7,
reg_lambda=1,
reg_alpha=1,
gamma=1,
min_child_weight=500,
),
version info
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin18.7.0)
CPU: Intel(R) Core(TM) i5-6267U CPU @ 2.90GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
[009559a3] XGBoost v1.1.1
I have the same issue as @clouds56, also on a MacBook Pro (Intel).
This is an old thread but since it is active I add my two cents for others who may have questions about multi-threading on Mac OS with Apple Silicon. By default libxgboost appears to claim threads based on a machine's cores and operating system. Multi-threading on Mac requires using the OpenMP library to be used when compiling libxgboost.dylib. It was not until libxgboost version 1.6.0 that the library for OpenMP on Apple Silicon became available in supplied binaries.
If booster parameter 'nthread' is not specified then full number of 'available' threads are used. Setting the parameter serves only to restrict the number of threads below what libxgboost would select on its own. It looks like multi-threading in libxgboost is used mostly for DMatrix operations and predict but does have some improvement with training. I can confirm that on my Apple M1 Pro multi-threading is occurring as a default. Going from nthread 1 to 4 reduces training time about 50-60%; above 4 there is no improvement. I can not figure out how to get how many threads libxgboost is actually using by default but performance is consistent with at least 4. Updating my version of XGBoost.jl also updated me to the then most current version of libxgboost which is 1.7.3.
Therefore, on Apple Silicon, if your XGBoost.jl is up to date then you are getting the benefits of multi-threading. Using GPU on Apple Silicon is not available; only time will tell if/when it gets here - this is dependent on issues outside of XGBoost.jl.
If you wish to confirm the libxgboost version that XGBoost.jl is using, my code for this task is below:
function libxgboost_version()
mj=Ref{Cint}(0)
mn=Ref{Cint}(0)
pt=Ref{Cint}(0)
XGBoost.XGBoostVersion(mj, mn, pt)
return(string(mj[])*"."*string(mn[])*"."*string(pt[]))
end
running into this problem again. This package ignores julia -t 1
set which causes some kind of soft lock for this computing node I'm on
Currently if the number of threads is not explicitly set we do not pass any argument to either DMatrix
or Booster
, I guess whatever the library is doing is not an appropriate default. Can you confirm that passing the keyword args nthreads=1
to DMatrix
and nthread=1
(yes, they are different) to Booster
solves your issue? If so we can replace their defaults with Threads.nthreads()
.
nthread=1 (yes, they are different) to Booster solves your issue?
works! should I make a PR?
Sure, thanks.
In its current implementation, the interface does not acknowledge the number of processes that Julia is being run with. By default, it uses all available CPU cores. This can be limited by using the parameter "nthread" during training. However, many Julia methods are mindful of the number of processes and limit themselves to that.
Perhaps it could be considered to pass this number of Julia processes value as the default argument during xgboost model training.