Laurae2 / ml-perf

MIT License
4 stars 0 forks source link

LightGBM in parallel: demo results (and with xgboost) #6

Open Laurae2 opened 5 years ago

Laurae2 commented 5 years ago

I just ran this: https://github.com/Laurae2/ml-perf/issues/5#issuecomment-491969652

If you want to see the numbers, skip the conclusions below.

Conclusions for our scenario, CPU:

Conclusions for our scenario, GPU:

General conclusion:

For information, I use the following hardware:

Baselines:

For reference:

Parallel threads = processes/threads used in parallel to run R (multiprocessing through sockets) Model threads = threads used to run xgboost (multithreading) Parallel GPUs = number of GPUs used in parallel processes/threads in R Parallel GPU threads = number of processes running in a single GPU Models = number of models to train in total Seconds / Model = average throughput for 1 model, in seconds Boost vs Baseline = your performance gain if you were to do the mentioned row vs doing only 1 CPU (or 1 GPU if GPU) process/thread for your model

LightGBM CPU:

Run Parallel Threads Model Threads Parallel GPUs GPU Threads Models Seconds / Model Boost vs Baseline
20 1 1 0 0 100 6.539 ~1x
21 9 1 0 0 250 0.760 8.57x
22 18 1 0 0 500 0.400 16.28x
23 35 1 0 0 1000 0.252 25.85x
24 70 1 0 0 2500 0.295 22.08x
25 1 1 0 0 250 6.502 ~1x
26 1 9 0 0 250 2.315 2.81x
27 1 18 0 0 250 2.269 2.87x
28 1 35 0 0 250 2.485 2.62x
29 1 70 0 0 250 3.051 2.13x

LightGBM GPU:

Run Parallel Threads Model Threads Parallel GPUs GPU Threads Models Seconds / Model Boost vs Baseline
1 1 1 1 1 50 6.769 ~1x
2 2 1 2 1 100 3.481 1.94x
3 3 1 3 1 250 2.354 2.88x
4 4 1 4 1 500 1.790 3.78x
5 4 1 1 4 100 2.166 3.13x
6 8 1 2 4 250 1.121 6.04x
7 12 1 3 4 500 0.772 8.77x
8 16 1 4 4 1000 0.586 11.55x
9 9 1 1 9 250 1.298 5.21x
10 18 1 2 9 500 0.709 9.55x
11 27 1 3 9 1000 0.496 13.65x
12 36 1 4 9 2500 0.400 16.92x
13 18 1 1 18 500 1.200 5.64x
14 36 1 2 18 1000 0.633 10.69x
15 54 1 3 18 2500 0.464 14.59x
16 72 1 4 18 5000 0.431 15.71x
17 35 1 1 35 1000 1.194 5.67x
18 35 1 2 35 2500 0.632 10.71x
19 58 1 1 58 2500 1.185 5.71x

I also refreshed xgboost hist results (re-ran them).

xgboost CPU:

Run Parallel Threads Model Threads Parallel GPUs GPU Threads Models Seconds / Model Boost vs Baseline
9 1 1 0 0 25 11.389 ~1x
10 9 1 0 0 50 1.456 7.82x
11 18 1 0 0 100 0.782 14.56x
12 35 1 0 0 250 0.489 23.28x
13 70 1 0 0 500 0.428 26.60x
14 1 1 0 0 50 11.383 ~1x
15 1 9 0 0 50 6.565 1.73x
16 1 18 0 0 50 6.481 1.76x
17 1 35 0 0 50 24.601 0.46x
18 1 70 0 0 50 165.947 0.07x

xgboost GPU:

Run Parallel Threads Model Threads Parallel GPUs GPU Threads Models Seconds / Model Boost vs Baseline
1 1 1 1 1 25 20.441 ~1x
2 2 1 2 1 50 10.639 1.92x
3 3 1 3 1 100 6.978 2.93x
4 4 1 4 1 250 5.176 3.95x
5 4 1 1 4 50 20.556 0.99x
6 8 1 2 4 100 10.501 1.95x
7 12 1 3 4 250 6.914 2.96x
8 16 1 4 4 500 5.295 3.86x
szilard commented 5 years ago

I will include this (simplified results for xgboost CPU) in my talks (with credit to @Laurae2):

Parallel Threads Model Threads Models Seconds / Model
1 1 25 11.39
9 1 50 1.46
18 1 100 0.78
35 1 250 0.49
70 1 500 0.43
1 1 50 11.4
1 9 50 6.6
1 18 50 6.6
1 35 50 25
1 70 50 165

(for easy ref: 2 socket system with 18+18HT cores each socket, total 72 cores; 0.1m dataset; 500 trees, depth 6, learn rate 0.05)

szilard commented 4 years ago

I will include this (simplified results for xgboost CPU) in my talks (with credit to @Laurae2):

Models Same Time Threads per Model Models Seconds / Model
1 1 25 11.39
9 1 50 1.46
18 1 100 0.78
35 1 250 0.49
70 1 500 0.43
1 1 50 11.4
1 9 50 6.6
1 18 50 6.6
1 35 50 25
1 70 50 165

(for easy ref: 2 socket system with 18+18HT cores each socket, total 72 cores; 0.1m dataset; 500 trees, depth 6, learn rate 0.05)