Open Laurae2 opened 5 years ago
I will include this (simplified results for xgboost CPU) in my talks (with credit to @Laurae2):
Parallel Threads | Model Threads | Models | Seconds / Model |
---|---|---|---|
1 | 1 | 25 | 11.39 |
9 | 1 | 50 | 1.46 |
18 | 1 | 100 | 0.78 |
35 | 1 | 250 | 0.49 |
70 | 1 | 500 | 0.43 |
1 | 1 | 50 | 11.4 |
1 | 9 | 50 | 6.6 |
1 | 18 | 50 | 6.6 |
1 | 35 | 50 | 25 |
1 | 70 | 50 | 165 |
(for easy ref: 2 socket system with 18+18HT cores each socket, total 72 cores; 0.1m dataset; 500 trees, depth 6, learn rate 0.05)
I will include this (simplified results for xgboost CPU) in my talks (with credit to @Laurae2):
Models Same Time | Threads per Model | Models | Seconds / Model |
---|---|---|---|
1 | 1 | 25 | 11.39 |
9 | 1 | 50 | 1.46 |
18 | 1 | 100 | 0.78 |
35 | 1 | 250 | 0.49 |
70 | 1 | 500 | 0.43 |
1 | 1 | 50 | 11.4 |
1 | 9 | 50 | 6.6 |
1 | 18 | 50 | 6.6 |
1 | 35 | 50 | 25 |
1 | 70 | 50 | 165 |
(for easy ref: 2 socket system with 18+18HT cores each socket, total 72 cores; 0.1m dataset; 500 trees, depth 6, learn rate 0.05)
I just ran this: https://github.com/Laurae2/ml-perf/issues/5#issuecomment-491969652
If you want to see the numbers, skip the conclusions below.
Conclusions for our scenario, CPU:
Conclusions for our scenario, GPU:
General conclusion:
For information, I use the following hardware:
Baselines:
For reference:
Parallel threads = processes/threads used in parallel to run R (multiprocessing through sockets) Model threads = threads used to run xgboost (multithreading) Parallel GPUs = number of GPUs used in parallel processes/threads in R Parallel GPU threads = number of processes running in a single GPU Models = number of models to train in total Seconds / Model = average throughput for 1 model, in seconds Boost vs Baseline = your performance gain if you were to do the mentioned row vs doing only 1 CPU (or 1 GPU if GPU) process/thread for your model
LightGBM CPU:
LightGBM GPU:
I also refreshed xgboost hist results (re-ran them).
xgboost CPU:
xgboost GPU: