Closed AlfredSAM closed 2 years ago
Supplements for the installation with the revised FindLibR.cmake
are as follows:
cd xgboost
git submodule init
git submodule update
mkdir build
cd build
cmake .. -DR_LIB=ON -DLIBR_EXECUTABLE=/Users/alfredfaisam/opt/miniconda3/envs/R_4.0_mkl/bin/R
make
make install
Sorry for the incomplete information.
Conda-forge provides the XGBoost R package: https://anaconda.org/conda-forge/r-xgboost-cpu. Can you try installing it with conda install -c conda-forge r-xgboost-cpu
?
Also, according to https://xgboost.readthedocs.io/en/latest/install.html#r, it should be sufficient to run install.packages("xgboost")
. It should make use of libomp
automatically.
Thanks @hcho3 ! Well, actually I tried both methods before but they fail to employ multi threads in use. Let's check the benchmarks using the system-wide R with xgboost R package built from source. As expected, multi threads are allowed to work as follows:
r$> require(xgboost)
x <- matrix(rnorm(100 * 10000), 10000, 100)
y <- x %*% rnorm(100) + rnorm(1000)
system.time({
bst <- xgboost(data = x, label = y, nthread = 1, nround = 100, verbose =
F)
})
Loading required package: xgboost
user system elapsed
19.075 0.098 16.838
r$> system.time({
bst <- xgboost(data = x, label = y, nthread = 4, nround = 100, verbose =
F)
})
user system elapsed
17.640 0.081 4.486
On the other hand, for some conda based on
name: R_4.0_mkl
channels:
- conda-forge
- defaults
dependencies:
- python=3.8
- conda-forge::r-base=4.1.0
- conda-forge::libblas=3.9.0=9_mkl
installing xgboost
using conda install -c conda-forge r-xgboost-cpu
CANNOT allow for multi-threads
r$> require(xgboost)
x <- matrix(rnorm(100 * 10000), 10000, 100)
y <- x %*% rnorm(100) + rnorm(1000)
system.time({
bst <- xgboost(data = x, label = y, nthread = 1, nround = 100, verbose = F)
})
Loading required package: xgboost
user system elapsed
17.161 0.063 16.618
r$> system.time({
bst <- xgboost(data = x, label = y, nthread = 4, nround = 100, verbose = F)
})
user system elapsed
16.791 0.046 16.877
Let's check another conda environment generated by the same .yml
file but installing xgboost using install.packages("xgboost")
. The first interesting point is that during the installation process I notice:
checking whether OpenMP will work in a package... no
*****************************************************************************************
OpenMP is unavailable on this Mac OSX system. Training speed may be suboptimal.
To use all CPU cores for training jobs, you should install OpenMP by running
brew install libomp
*****************************************************************************************
even though I have installed ti via
brew install libomp
Therefore, the results are not surprising:
r$> require(xgboost)
x <- matrix(rnorm(100 * 10000), 10000, 100)
y <- x %*% rnorm(100) + rnorm(1000)
system.time({
bst <- xgboost(data = x, label = y, nthread = 1, nround = 100, verbose = F)
})
Loading required package: xgboost
user system elapsed
17.789 0.058 17.284
r$> system.time({
bst <- xgboost(data = x, label = y, nthread = 4, nround = 100, verbose = F)
})
user system elapsed
17.165 0.044 17.251
It seems that installation from source for the conda environment under MacOS is necessary to allow for multi-threads, but just need some revisions.
Got it. I'm out of ideas. The OpenMP support in MacOS has been a sticky point for a while and even with libomp
there are some use cases that falls through the crack, such as yours. Feel free to share your insights once you figure something out.
After several trials, I figure out a method to solve this problem even though it is not that elegant. First, after the installation of libomp
via
brew install libomp
OpenMP
should be available for MacOS, so that for system-wide R installation of xgboost from source can successfully make multi-threads
available. Therefore, the problems should be the compilation process when installing xgboost
within the conda environment. Inspired by this post, I try to check the file path using the following command in R console within the conda environment:
file.path(R.home("etc"), "Makeconf")
Using vim to examine this file in the above returned path, I notice that this file is within the path of corresponding conda environment, and the following are set
SHLIB_OPENMP_CFLAGS = -fopenmp
SHLIB_OPENMP_CXXFLAGS = -fopenmp
SHLIB_OPENMP_FFLAGS = -fopenmp
However, the following are blank
SHLIB_CFLAGS =
SHLIB_CXXFLAGS =
SHLIB_FFLAGS =
Unfortunately, when using install.packages("xgboost")
in the R console within conda environment, I cannot find -fopenmp
is employed as effective flags for compilation. Therefore, I just revise the above file to set and save
SHLIB_CFLAGS = -fopenmp
SHLIB_CXXFLAGS = -fopenmp
SHLIB_FFLAGS = -fopenmp
Now, just useinstall.packages("xgboost")
in the R console within conda environment. As before, the information
checking whether OpenMP will work in a package... no
*****************************************************************************************
OpenMP is unavailable on this Mac OSX system. Training speed may be suboptimal.
To use all CPU cores for training jobs, you should install OpenMP by running
brew install libomp
*****************************************************************************************
is still shown up. However, during the compilation process, -fopenmp
is found as the effective flag. After the installation, I find that multi-threads
available now:
r$> require(xgboost)
x <- matrix(rnorm(100 * 10000), 10000, 100)
y <- x %*% rnorm(100) + rnorm(1000)
system.time({
bst <- xgboost(data = x, label = y, nthread = 1, nround = 100, verbose = F)
})
Loading required package: xgboost
user system elapsed
19.429 0.130 17.317
r$> system.time({
bst <- xgboost(data = x, label = y, nthread = 4, nround = 100, verbose = F)
})
user system elapsed
17.949 0.063 4.538
r$> system.time({
bst <- xgboost(data = x, label = y, nthread = 8, nround = 100, verbose = F)
})
user system elapsed
27.401 0.094 3.457
Even though this method is not that elegant, I guess I am fine with such revision. Furthermore, it brings no harm to keep such settings which may be also beneficial to other packages which need compilation and employ multi-threads.
Thanks @hcho3 all the same, and hope this post can add some hints about installing xgboost under conda environment under MacOS.
Another remark here is about installation of xgboost python package within conda environment. The short finding is that with libomp
installed using
brew install libomp
then installation using conda install -c conda-forge xgboost
can make multi-thread
available. In my experiment, I also employ mkl
to accelarate numpy
, just like what I set for conda environment of R:
conda install -c conda-forge numpy libblas=3.9.0=9_mkl
and then install xgboost
:
conda install -c conda-forge xgboost
Try the following
In [1]: import numpy as np
...: import xgboost as xgb
...: import timeit
...:
...: data = np.random.rand(10000, 100)
...: label = np.random.randint(2, size=10000)
...: dtrain = xgb.DMatrix(data, label=label)
...:
...: param_1 = {'objective': 'binary:logistic', 'nthread': 1, 'eval_metric': 'auc'}
...:
...: param_4 = {'objective': 'binary:logistic', 'nthread': 4, 'eval_metric': 'auc'}
...:
...: param_8 = {'objective': 'binary:logistic', 'nthread': 8, 'eval_metric': 'auc'}
...:
...: num_round = 100
In [2]: start = timeit.default_timer()
...:
...: xgb.train(param_1, dtrain, num_round)
...:
...: stop = timeit.default_timer()
...:
...: print('Time: ', stop - start)
Time: 16.160123399
In [3]: start = timeit.default_timer()
...:
...: xgb.train(param_4, dtrain, num_round)
...:
...: stop = timeit.default_timer()
...:
...: print('Time: ', stop - start)
Time: 4.242956155000002
In [4]: start = timeit.default_timer()
...:
...: xgb.train(param_8, dtrain, num_round)
...:
...: stop = timeit.default_timer()
...:
...: print('Time: ', stop - start)
Time: 3.200284463999999
Therefore, in terms of xgboost python package within conda environment under MacOS, OpenMP
is correctly set to be in use. However, for xgboost R package under MacOS, installation from source is necessary to allow OpenMP
. For system-wide R, just follow https://xgboost.readthedocs.io/en/latest/build.html#installing-the-development-version-linux-mac-osx; for R within conda environment my above solution may be the easy but not that elegant way to fix it. On the other hand, it is necessary to install libomp
for MacOS at the very beginning:
brew install libomp
Hello! Xgboost only uses ONE thread (core) under MacOS if it is installed using general
install.packages("xgboost")
. The solution is to install xgboost R package from source as indicated in https://xgboost.readthedocs.io/en/latest/build.html#installing-the-development-version-linux-mac-osx. Of coz, I also installat first, and then follow the above instructions to build the xgboost R package from source. I successfully install it with
multi-threads
work, but ONLY for the system-wide R.I fail to install xgboost R package from source inside conda environment under MacOS (big sur).
In order to conduct the tests using different versions of R, conda environments are usually constructed to install R and related packages separated from the system-wide R. For example, I just use the following
R_4_mkl.yml
to construct the conda environment:In terminal, just input the following to construct the new conda environment and then activate:
Next, I would like to install xgboost R package inside this conda environment. Well, I just slightly revise the
FindLibR.cmake
in xgboost/cmake/modules to allow the user to setup the proper executable R path. Please check the revised file:FindLibR.cmake.zip
and the key part is
Now I just follow
where ${executable R path} is the result of
which R
inside the conda environment. The procedure fails in the final stepmake install
:Well, could you please help check this issue, or any suggestions about installing xgboost R package from source inside conda environment under MacOS?