dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
25.69k stars 8.67k forks source link

xgboost is extremely slow on i9-14900KF processor #10289

Closed sermakarevich closed 2 weeks ago

sermakarevich commented 2 weeks ago

Hey. Are there any recipes to make xgboost work on i9-14900KF ?

I am running a very simple snippet from xgboost site:

import time
start = time.time()
from xgboost import XGBClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(data['data'], data['target'], test_size=.2)
bst = XGBClassifier(n_estimators=32, max_depth=20, learning_rate=1, objective='binary:logistic', device="cpu", verbosity=3)
bst.fit(X_train, y_train)
print(time.time() - start)

and getting 12 seconds execution time on

ProArt Z790-CREATOR WIFI, BIOS 2102 03/15/2024
Intel® Core™ i9-14900KF
RAM: 64GB 5200MHz (4x 32 GB ) Kingston FURY Beast

vs 0.04 seconds on M1 (IMAC)

here are logs:

[20:09:20] ======== Monitor (0): HostSketchContainer ========
[20:09:20] AllReduce: 0.009052s, 1 calls @ 9052us

[20:09:20] MakeCuts: 0.018199s, 1 calls @ 18199us

[20:09:20] DEBUG: /workspace/src/gbm/gbtree.cc:130: Using tree method: 0
[20:09:32] ======== Monitor (0): Learner ========
[20:09:32] Configure: 0.000288s, 1 calls @ 288us

[20:09:32] EvalOneIter: 0.000166s, 32 calls @ 166us

[20:09:32] GetGradient: 0.275794s, 32 calls @ 275794us

[20:09:32] PredictRaw: 3.9e-05s, 32 calls @ 39us

[20:09:32] UpdateOneIter: 11.7027s, 32 calls @ 11702683us

[20:09:32] ======== Monitor (0): GBTree ========
[20:09:32] BoostNewTrees: 11.4264s, 32 calls @ 11426405us

[20:09:32] CommitModel: 2.2e-05s, 32 calls @ 22us

[20:09:32] ======== Monitor (0): HistUpdater ========
[20:09:32] BuildHistogram: 2.65033s, 97 calls @ 2650327us

[20:09:32] EvaluateSplits: 1.741s, 193 calls @ 1740997us

[20:09:32] InitData: 0.845362s, 96 calls @ 845362us

[20:09:32] InitRoot: 3.5087s, 96 calls @ 3508705us

[20:09:32] LeafPartition: 2e-05s, 96 calls @ 20us

[20:09:32] UpdatePosition: 1.77232s, 97 calls @ 1772317us

[20:09:32] UpdatePredictionCache: 0.893912s, 96 calls @ 893912us

[20:09:32] UpdateTree: 9.66024s, 96 calls @ 9660242us

11.881776571273804

and from IMAC

[20:10:52] ======== Monitor (0): HostSketchContainer ========
[20:10:52] AllReduce: 4.5e-05s, 1 calls @ 45us

[20:10:52] MakeCuts: 0.000113s, 1 calls @ 113us

[20:10:52] DEBUG: /Users/runner/work/xgboost/xgboost/src/gbm/gbtree.cc:130: Using tree method: 0
[20:10:52] ======== Monitor (0): Learner ========
[20:10:52] Configure: 0.000277s, 1 calls @ 277us

[20:10:52] EvalOneIter: 0.00011s, 32 calls @ 110us

[20:10:52] GetGradient: 0.0006s, 32 calls @ 600us

[20:10:52] PredictRaw: 2.6e-05s, 32 calls @ 26us

[20:10:52] UpdateOneIter: 0.028095s, 32 calls @ 28095us

[20:10:52] ======== Monitor (0): GBTree ========
[20:10:52] BoostNewTrees: 0.027032s, 32 calls @ 27032us

[20:10:52] CommitModel: 3.7e-05s, 32 calls @ 37us

[20:10:52] ======== Monitor (0): HistUpdater ========
[20:10:52] BuildHistogram: 0.007122s, 122 calls @ 7122us

[20:10:52] EvaluateSplits: 0.004632s, 218 calls @ 4632us

[20:10:52] InitData: 0.001774s, 96 calls @ 1774us

[20:10:52] InitRoot: 0.005786s, 96 calls @ 5786us

[20:10:52] LeafPartition: 9e-06s, 96 calls @ 9us

[20:10:52] UpdatePosition: 0.005379s, 122 calls @ 5379us

[20:10:52] UpdatePredictionCache: 0.001679s, 96 calls @ 1679us

[20:10:52] UpdateTree: 0.023586s, 96 calls @ 23586us

0.04190373420715332

there is no such issue with RandomForest from sklearn lib. all 32 processes of i9-14900KF are 100% utilised for all the 12 seconds. xgboost version is '2.0.3'