Xtra-Computing / thundergbm

ThunderGBM: Fast GBDTs and Random Forests on GPUs
Apache License 2.0
692 stars 87 forks source link

Need to set parameters to choose GPU or CPU?How do I set it up if I need to? #25

Open oChristineo opened 5 years ago

oChristineo commented 5 years ago

I trained the test_dataset.txt under thundergbm/dataset with thundergbm, but the execution time was 1-2 seconds slower than xgboost. I don't know if it's because I didn't use the GPU?(Windows 10, NVIDIA GETFORCE GTX 1050,CUDA 10.1)

zeyiwen commented 5 years ago

ThunderGBM uses GPUs by default. The test_dataset.txt data set is quick small, and should not be so slow. You may try to train the model with ThunderGBM using the command line.

It would be good if you could share your script on running ThunderGBM, so that we can help you identify the potential problems the script might have.

oChristineo commented 5 years ago

Thank you very much for quickly replying.

Here is my code: from thundergbm import from xgboost import from sklearn.metrics import r2_score from sklearn.datasets import * from sklearn.metrics import mean_squared_error from math import sqrt import time

x,y = load_svmlight_file("E:/CUDA code/thundergbm/dataset/test_dataset.txt") TGBMmodel = TGBMRegressor(tree_method='hist') XGBmodel = XGBRegressor(tree_method='hist')

start1 = time.time() TGBMmodel.fit(x,y) end1 = time.time() TGBMduration = end1 - start1

start2 = time.time() XGBmodel.fit(x,y) end2 = time.time() XGBduration = end2 - start2

print('TGBM elapsed time:{:.4f}s'.format(TGBMduration)) print('XGB elapsed time:{:.4f}s'.format(XGBduration))

x2,y2=load_svmlight_file("E:/CUDA code/thundergbm/dataset/test_dataset.txt") y_predict1=TGBMmodel.predict(x2) y_predict2=XGBmodel.predict(x2) rms1 = sqrt(mean_squared_error(y2, y_predict1)) print("TGBM RMS: %f" % rms1) rms2 = sqrt(mean_squared_error(y2, y_predict2)) print("XGB RMS: %f" % rms2)

accuracy1 = r2_score(y2, y_predict1) print("TGBM Accuracy: %.2f%%" % (accuracy1 100.0)) accuracy2 = r2_score(y2, y_predict2) print("XGB Accuracy: %.2f%%" % (accuracy2 100.0))


I trained the model in Spyder and the result is: TGBM elapsed time:1.5382s XGB elapsed time:0.1114s TGBM RMS: 0.489562 XGB RMS: 0.603319 TGBM Accuracy: 67.71% XGB Accuracy: 50.95%

I try to train the model using the command line,the result is : TGBM elapsed time:2.7293s XGB elapsed time:0.1688s the time is slower than I trained the model in Spyder. I don't know why.

Kurt-Liuhf commented 5 years ago

The default parameters of XGBRegressor, which you used, are as follow according to the documentation of XGBoost. max_depth=3, learning_rate=1, n_estimators=100, objective="reg:linear" ...

The tree_method of XGBoost should be gpu_hist if you want to run it on the GPUs.

In comparison, the default parameters of TGBMRegressor are as follow. max_depth=6, learning_rate=1, n_estimators=40, objective="reg:linear" ...

So, the experimental results of your script are unfair to ThunderGBM. It would be better if you read the parameters documents of two libraries first.

oChristineo commented 5 years ago

Thank you very much for your replying and suggestion. Thundergbm is indeed faster than xgboost after modifying the parameters. But I failed with setting parameters 'max_depth' and 'n_estimators' for thundergbm:

TypeError: init() got an unexpected keyword argument ' max_depth' and 'n_estimators'

It will be ok if I set the parameters 'depth' and 'n_trees'.