Documentations | Installation | Parameters | Python (scikit-learn) interface
ThunderGBM won 2019 Best Paper Award from IEEE Transactions on Parallel and Distributed Systems by the IEEE Computer Society Publications Board (1 out of 987 submissions, for the work "Zeyi Wen^, Jiashuai Shi, Bingsheng He, Jian Chen, Kotagiri Ramamohanarao, and Qinbin Li, Exploiting GPUs for Efficient Gradient Boosting Decision Tree Training , IEEE Transactions on Parallel and Distributed Systems, vol. 30, no. 12, 2019, pp. 2706-2717."). see more details: Best Paper Award Winners from IEEE, News from NUS School of Computing
The mission of ThunderGBM is to help users easily and efficiently apply GBDTs and Random Forests to solve problems. ThunderGBM exploits GPUs to achieve high efficiency. Key features of ThunderGBM are as follows.
Why accelerate GBDT and Random Forests: A survey conducted by Kaggle in 2017 shows that 50%, 46% and 24% of the data mining and machine learning practitioners are users of Decision Trees, Random Forests and GBMs, respectively.
GBDTs and Random Forests are often used for creating state-of-the-art data science solutions. We've listed three winning solutions using GBDTs below. Please check out the XGBoost website for more winning solutions and use cases. Here are some example successes of GDBTs and Random Forests:
For Linux with CUDA 9.0
pip install thundergbm
For Windows (64bit)
Download the Python wheel file (for Python3 or above)
Install the Python wheel file
pip install thundergbm-0.3.4-py3-none-win_amd64.whl
Currently only support python3
After you have installed thundergbm, you can import and use the classifier (similarly for regressor) by:
from thundergbm import TGBMClassifier
clf = TGBMClassifier()
clf.fit(x, y)
git clone https://github.com/zeyiwen/thundergbm.git
cd thundergbm
#under the directory of thundergbm
git submodule init cub && git submodule update
#under the directory of thundergbm
mkdir build && cd build && cmake .. && make -j
./bin/thundergbm-train ../dataset/machine.conf
./bin/thundergbm-predict ../dataset/machine.conf
You will see RMSE = 0.489562
after successful running.
MacOS is not supported, as Apple has suspended support for some NVIDIA GPUs. We will consider supporting MacOS based on our user community feedbacks. Please stay tuned.
If you use ThunderGBM in your paper, please cite our work (TPDS and JMLR).
@ARTICLE{8727750,
author={Z. {Wen} and J. {Shi} and B. {He} and J. {Chen} and K. {Ramamohanarao} and Q. {Li}},
journal={IEEE Transactions on Parallel and Distributed Systems},
title={Exploiting GPUs for Efficient Gradient Boosting Decision Tree Training},
year={2019},
volume={30},
number={12},
pages={2706-2717},
}
@article{wenthundergbm19,
author = {Wen, Zeyi and Shi, Jiashuai and He, Bingsheng and Li, Qinbin and Chen, Jian},
title = {{ThunderGBM}: Fast {GBDTs} and Random Forests on {GPUs}},
journal = {Journal of Machine Learning Research},
volume={21},
year = {2020}
}
Zeyi Wen, Jiashuai Shi, Bingsheng He, Jian Chen, Kotagiri Ramamohanarao and Qinbin Li. Exploiting GPUs for Efficient Gradient Boosting Decision Tree Training. IEEE Transactions on Parallel and Distributed Systems (TPDS), accepted in May 2019. pdf
Zeyi Wen, Hanfeng Liu, Jiashuai Shi, Qinbin Li, Bingsheng He, Jian Chen. ThunderGBM: Fast GBDTs and Random Forests on GPUs. Featured at JMLR MLOSS (Machine Learning Open Source Software). Year: 2020, Volume: 21, Issue: 108, Pages: 1−5. pdf
Zeyi Wen, Bingsheng He, Kotagiri Ramamohanarao, Shengliang Lu, and Jiashuai Shi. Efficient Gradient Boosted Decision Tree Training on GPUs. The 32nd IEEE Intern ational Parallel and Distributed Processing Symposium (IPDPS), pages 234-243, 2018. pdf