GBM-Benchmarks

GBM benchmark suite for the purpose of evaluating the speed of XGBoost GPU on multi-GPU systems with large datasets.

This benchmark is designed to be run on an AWS p3.16xlarge instance with 8 V100 GPUs. It is recommended to use the Deep Learning Base AMI (Ubuntu) and storage of at least 150GB for this task.

Requirements

Cuda 9.0
Python 3
Sklearn, Pandas, numpy
Kaggle CLI with a valid API token
Usage

Install XGBoost, LightGBM and Catboost:
```
sh install_gbm.sh
```

Run the benchmarks

python3 benchmarks.py

It can be useful to run the benchmarks with a small number of rows/rounds to quickly check everything is working:

python3 benchmarks.py --rows 100 --num_rounds 10

Benchmark parameters:

usage: benchmark.py [-h] [--rows ROWS] [--num_rounds NUM_ROUNDS]
                    [--datasets DATASETS] [--algs ALGS]

optional arguments:
  -h, --help            show this help message and exit
  --rows ROWS           Max rows to benchmark for each dataset. (default:
                        None)
  --num_rounds NUM_ROUNDS
                        Boosting rounds. (default: 500)
  --datasets DATASETS   Datasets to run. (default:
                        YearPredictionMSD,Synthetic,Higgs,Cover
                        Type,Bosch,Airline,)
  --algs ALGS           Boosting algorithms to run. (default: xgb-cpu-
                        hist,xgb-gpu-hist,lightgbm-cpu,lightgbm-gpu,cat-
                        cpu,cat-gpu)

Datasets

Datasets are loaded using ml_dataset_loader. Datasets are automatically downloaded and cached over subsequent runs. Allow time for these downloads on the first run.

Example results

Run on 7 June 2018

	"('YearPredictionMSD' 'Time(s)')"	"('YearPredictionMSD' 'RMSE')"	"('Synthetic' 'Time(s)')"	"('Synthetic' 'RMSE')"	"('Higgs' 'Time(s)')"	"('Higgs' 'Accuracy')"	"('Cover Type' 'Time(s)')"	"('Cover Type' 'Accuracy')"	"('Bosch' 'Time(s)')"	"('Bosch' 'Accuracy')"	"('Airline' 'Time(s)')"	"('Airline' 'Accuracy')"
xgb-cpu-hist	397.27372694015503	8.879391001888838	565.2947809696198	13.610471042735508	470.09188079833984	0.7474345454545455	464.05221605300903	0.891982134712529	752.5890619754791	0.994454065469905	1948.264995098114	0.7494303418939346
xgb-gpu-hist	34.25581908226013	8.879935744972384	38.48715591430664	13.460576927868603	34.07960486412048	0.747475	103.3895480632782	0.8928685145822397	32.12634301185608	0.9944244984160507	144.8635070323944	0.749484266051801
lightgbm-cpu	38.12508988380432	8.877691075962955	421.0538258552551	13.585034611136265	306.9785330295563	0.7473804545454545	83.76876091957092	0.8928340920630277	250.0972819328308	0.9943907074973601	916.0412080287933	0.7504912703697312
lightgbm-gpu	80.04824590682983	8.88175154521266	609.4814240932465	13.585007307447382	529.5377051830292	0.7469995454545455	126.52870297431946	0.8930578384379061	487.14922618865967	0.9944076029567054	614.7447829246521	0.749949160947056
cat-cpu	38.49950695037842	8.994799241732066	436.58789801597595	9.389984249250787	397.02287697792053	0.7406940909090909	288.1107921600342	0.8518626885708631	242.90423798561096	0.9944160506863781	2949.0425968170166	0.7265709745333714
cat-gpu	9.802947044372559	9.036473602545339	35.474628925323486	9.399963630634538	30.145710945129395	0.7406177272727272	N/A	N/A	N/A	N/A	303.35544514656067	0.7277047723183877

Scalability test

We test the scalability of multi-GPU XGBoost by running with between 1-8 GPUs on the airline dataset and timing the results.

python3 scalability.py -h
usage: scalability.py [-h] [--rows ROWS] [--num_rounds NUM_ROUNDS]

optional arguments:
  -h, --help            show this help message and exit
  --rows ROWS           Max rows to benchmark for each dataset. (default:
                        None)
  --num_rounds NUM_ROUNDS
                        Boosting rounds. (default: 500)

RAMitchell / GBM-Benchmarks

readme

GBM-Benchmarks

Requirements

Usage

Datasets

Example results

Scalability test