dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.12k stars 8.7k forks source link

GPU has lower accuracy #2300

Closed gaw89 closed 7 years ago

gaw89 commented 7 years ago

Environment info

Operating System: Windows 7 (build 7601, Service Pack 1) 64bit CUDA 8.0 cuDNN 5.1

Hardware: CPU: Xeon E5-1620 V4 GPU: Asus GTX 1080 ti FE

Compiler: Downloaded pre-compiled version from Guido Tapia on 5/12/17

Package used (python/R/jvm/C++): Python

xgboost version used: .6

If you are using python package, please provide

  1. The python version and distribution Anaconda Python 2.7

Steps to reproduce

This gist shows an example with dummy data:

https://gist.github.com/gaw89/7b0d68e5b79c9d26f00c8472edf99b28

Here's the output:

GPU ---------------------------- [0] validation-logloss:0.678408 training-logloss:0.677604 [10] validation-logloss:0.558674 training-logloss:0.55084 [20] validation-logloss:0.469776 training-logloss:0.456777 [30] validation-logloss:0.403235 training-logloss:0.385644 [40] validation-logloss:0.349319 training-logloss:0.328338 [50] validation-logloss:0.306017 training-logloss:0.282227 [60] validation-logloss:0.271768 training-logloss:0.245588 [70] validation-logloss:0.243736 training-logloss:0.215626 [80] validation-logloss:0.219791 training-logloss:0.19035 [90] validation-logloss:0.200407 training-logloss:0.169535 [99] validation-logloss:0.184985 training-logloss:0.153226 Raw Validation Accuracy: 0.96708

CPU ---------------------------- [0] validation-logloss:0.678401 training-logloss:0.677562 [10] validation-logloss:0.558419 training-logloss:0.550815 [20] validation-logloss:0.470883 training-logloss:0.45759 [30] validation-logloss:0.40312 training-logloss:0.385283 [40] validation-logloss:0.349083 training-logloss:0.32783 [50] validation-logloss:0.306609 training-logloss:0.282439 [60] validation-logloss:0.271177 training-logloss:0.244778 [70] validation-logloss:0.242388 training-logloss:0.214145 [80] validation-logloss:0.219018 training-logloss:0.189054 [90] validation-logloss:0.199262 training-logloss:0.168125 [99] validation-logloss:0.184332 training-logloss:0.152177 Raw Validation Accuracy: 0.96748

As you can see, CPU provides slightly better accuracy. However, when I try this on my real dataset (which I cannot include here), the difference is vast. In fact, the GPU logloss is all over the place.

GPU ---------------------------- [0] validation-logloss:20.5582 training-logloss:19.9119 [10] validation-logloss:8.30447 training-logloss:10.8549 [20] validation-logloss:6.82514 training-logloss:7.73837 [30] validation-logloss:19.178 training-logloss:15.8284 [40] validation-logloss:12.5911 training-logloss:10.5985 [50] validation-logloss:16.3701 training-logloss:12.7894 [60] validation-logloss:13.9307 training-logloss:10.9101 [70] validation-logloss:16.2863 training-logloss:12.2042 [80] validation-logloss:13.3689 training-logloss:10.2872 [90] validation-logloss:11.5154 training-logloss:9.0604 [99] validation-logloss:15.6321 training-logloss:12.9941 Raw Validation Accuracy: 0.462666467155

CPU ---------------------------- [0] validation-logloss:0.683447 training-logloss:0.681888 [10] validation-logloss:0.602152 training-logloss:0.592586 [20] validation-logloss:0.544172 training-logloss:0.525163 [30] validation-logloss:0.499751 training-logloss:0.475508 [40] validation-logloss:0.466776 training-logloss:0.435446 [50] validation-logloss:0.441168 training-logloss:0.40346 [60] validation-logloss:0.420259 training-logloss:0.378412 [70] validation-logloss:0.404089 training-logloss:0.358238 [80] validation-logloss:0.391373 training-logloss:0.342194 [90] validation-logloss:0.381069 training-logloss:0.328313 [99] validation-logloss:0.374225 training-logloss:0.318104 Raw Validation Accuracy: 0.847972467455

There are also substantial differences in feature importances between the various CPU/GPU, int/float versions of the model.

What have you tried?

  1. Per this issue, I tried multiplying by 1e6 AND converting to int, and this has helped greatly. However, the accuracy is still a small (though non-trivial) amount lower on GPU than the regular float64 CPU version (though int GPU is better than int CPU):

GPU ---------------------------- [99] validation-logloss:0.375699 training-logloss:0.319473 Raw Validation Accuracy: 0.846663175221

CPU ---------------------------- [99] validation-logloss:0.375026 training-logloss:0.319558 Raw Validation Accuracy: 0.84587759988

  1. I have tried converting to several other datatypes and different methods of scaling as well, but nothing worked as well as the 1e6/int conversion.

Is this type of thing to be expected with differences in how GPU and CPU perform numerical operations? Is there any way to get comparable performance on GPU?

RAMitchell commented 7 years ago

This definitely looks like insufficient floating point precision. We do summation in single precision on the gpu. I will look into a fix in future.

gaw89 commented 7 years ago

Thanks for your response. I figured it had something to do with the precision. The problem is worst when using float64 and slightly less bad using float32 and float16. The best, as I said is with int.

Does this mean there's nothing I can do on my end (I have zero experience in C and CUDA)?

Do you think a reasonable solution in the meantime would be to develop the model on GPU for speed then run final training on CPU (hoping that the optimal hyperparameters, etc. are the same on float/CPU as on int/GPU)?

RAMitchell commented 7 years ago

Are you using 'grow_gpu_hist' or 'grow_gpu'?

This kind of thing normally only happens on very extreme values. It can be good practice to do some kind of scaling or normalisation on your features regardless of which machine learning algorithm you use if this is the case.

gaw89 commented 7 years ago

Currently using 'grow_gpu'. My understanding was that with tree-based methods, feature scaling doesn't really make a difference (I've tried it before and don't recall experiencing any difference in XGBoost), but I'll give that a shot again to see if it has any impact. I'll also try 'grow_gpu_hist' to see if that has any impact.

I'll let you know what I come up with.

RAMitchell commented 7 years ago

Sorry, you are correct, feature scaling will not make a difference. I meant to say label scaling. Try log scaling if the values are large.

gaw89 commented 7 years ago

No problem. I am not sure I follow about label scaling. Do you mean in a regression problem, take the log of the regression output? I am just doing binary classification in this case.

gaw89 commented 7 years ago

So it turns out 'grow_gpu_hist' is not much different performance-wise, but I didn't realize how big the speed boost would be - 5 seconds to train 100 rounds vs. 33 seconds with 'grow_gpu'!

GROW_GPU: [0] validation-logloss:0.683218 training-logloss:0.68192 [10] validation-logloss:0.600431 training-logloss:0.591257 [20] validation-logloss:0.540824 training-logloss:0.524287 [30] validation-logloss:0.497868 training-logloss:0.474415 [40] validation-logloss:0.464888 training-logloss:0.434754 [50] validation-logloss:0.440185 training-logloss:0.404325 [60] validation-logloss:0.420208 training-logloss:0.379215 [70] validation-logloss:0.404739 training-logloss:0.359366 [80] validation-logloss:0.392556 training-logloss:0.343219 [90] validation-logloss:0.382536 training-logloss:0.329614 [99] validation-logloss:0.375699 training-logloss:0.319473 Raw Validation Accuracy: 0.84666317522070922

GROW_GPU_HIST: [0] validation-logloss:0.683255 training-logloss:0.68194 [10] validation-logloss:0.594339 training-logloss:0.587754 [20] validation-logloss:0.53516 training-logloss:0.521548 [30] validation-logloss:0.49303 training-logloss:0.472418 [40] validation-logloss:0.461247 training-logloss:0.433685 [50] validation-logloss:0.436651 training-logloss:0.403764 [60] validation-logloss:0.417501 training-logloss:0.379227 [70] validation-logloss:0.402805 training-logloss:0.359669 [80] validation-logloss:0.391159 training-logloss:0.343627 [90] validation-logloss:0.381589 training-logloss:0.330329 [99] validation-logloss:0.374952 training-logloss:0.320207 Raw Validation Accuracy: 0.84699985036660186

Also, it appears that 'grow-gpu-hist' does not support the 'seed' parameter properly. Is that an issue, or is it to be expected?

Run 1: ('Raw Validation Accuracy:', 0.846102049977555) Run 2: ('Raw Validation Accuracy:', 0.84666317522070922) Run 3: ('Raw Validation Accuracy:', 0.84692503366751459)

param['seed'] = 123 for each run.

gaw89 commented 7 years ago

@RAMitchell, interestingly, this problem seems to have magically resolved itself overnight for me. I came into work this morning and ran it again, and these are the results:

GPU/float64 - Raw Validation Accuracy: 0.858061861839 CPU/float64 - Raw Validation Accuracy: 0.858847290272 GPU/int - Raw Validation Accuracy: 0.857463440177

Nothing in my code has changed since I ran this yesterday and posted above. Not sure what made the difference, but it appears to be resolved.

iFe1er commented 7 years ago

Still the GPU accuracy is slightly worse than the CPU one. But comparing with the 5x times faster efficiency, I think it is acceptable for me.