dmlc / xgboost

Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
https://xgboost.readthedocs.io/en/stable/
Apache License 2.0
26.13k stars 8.7k forks source link

Windows xgboost GPU compilation crash #3033

Closed Laurae2 closed 6 years ago

Laurae2 commented 6 years ago

Compilation for my Windows machine fails under a specific scenario:

Affects only one desktop with a NVIDIA 1030. Exact copy of that desktop (just swapped the NVIDIA 1030 with a NVIDIA 1080 Ti) compiles correctly. Swapping back the NVIDIA 1080 Ti for the NVIDIA 1030 causes again the issues. Running xgboost with the NVIDIA 1080 Ti causes xgboost to crash.

One of my laptops with a 1050 Ti (not identical drive, but identical R / Visual Studio / CUDA / PATH setup) has the same issue as the NVIDIA 1030. Plugged the NVIDIA 1080 Ti as a eGPU (and disabled NVIDIA 1050 Ti) causes compilation to succeed.

Environment info

Operating System: Windows 8.1 Pro

Compiler: Visual Studio 2015 + CUDA 8

GPU: 1030, 1050 Ti, 1080 Ti

Package used (python/R/jvm/C++): R

xgboost version used: a187ed6 (#3014) and 84ab74f (#2935)

If you are using R package, please provide

  1. The R sessionInfo()
> sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.4.2   tools_3.4.2      xgbdl_0.0.0.9000 yaml_2.1.16 
  1. The command to install xgboost if you are not installing from source

Copy&pasted from: http://xgboost.readthedocs.io/en/latest/build.html#installing-r-package-with-gpu-support

cmake .. -G"Visual Studio 14 2015 Win64" -DUSE_CUDA=ON -DR_LIB=ON
cmake --build . --target install --config Release

With a187ed6, cicc.exe crashes here:

  updater_gpu_hist.cu
CUSTOMBUILD : nvcc error : 'cicc' died with status 0xC0000005 (ACCESS_VIOLATION) [C:\Users\Laurae\AppData\Local\Temp\RtmpcHaBAD\xgboost\build\gpuxgboost.vcxproj]
  CMake Error at gpuxgboost_generated_updater_gpu_hist.cu.obj.Release.cmake:282 (message):
    Error generating file
    C:/Users/Laurae/AppData/Local/Temp/RtmpcHaBAD/xgboost/build/CMakeFiles/gpuxgboost.dir/src/tree/Release/gpuxgboost_generated_updater_gpu_hist.cu.obj

With 84ab74f, it stops here:

"C:\Users\Laurae\AppData\Local\Temp\RtmpKkX2wK\xgboost\build\install.vcxproj" (default target) (1) ->
"C:\Users\Laurae\AppData\Local\Temp\RtmpKkX2wK\xgboost\build\ALL_BUILD.vcxproj" (default target) (3) ->
"C:\Users\Laurae\AppData\Local\Temp\RtmpKkX2wK\xgboost\build\gpuxgboost.vcxproj" (default target) (5) ->
(CustomBuild target) -> 
  C:/Users/Laurae/AppData/Local/Temp/RtmpKkX2wK/xgboost/src/objective/regression_obj_gpu.cu(46): error : identifier "uint" is undefined [C:\Users\Laurae\AppData\Local\Temp\RtmpKkX2wK\xgboost\build\gpuxgboost.vcxproj]
  C:/Users/Laurae/AppData/Local/Temp/RtmpKkX2wK/xgboost/src/objective/regression_obj_gpu.cu(79): error : identifier "uint" is undefined [C:\Users\Laurae\AppData\Local\Temp\RtmpKkX2wK\xgboost\build\gpuxgboost.vcxproj]
  C:/Users/Laurae/AppData/Local/Temp/RtmpKkX2wK/xgboost/src/objective/regression_obj_gpu.cu(178): error : identifier "uint" is undefined [C:\Users\Laurae\AppData\Local\Temp\RtmpKkX2wK\xgboost\build\gpuxgboost.vcxproj]
  C:/Users/Laurae/AppData/Local/Temp/RtmpKkX2wK/xgboost/src/objective/regression_obj_gpu.cu(178): error : identifier "uint" is undefined [C:\Users\Laurae\AppData\Local\Temp\RtmpKkX2wK\xgboost\build\gpuxgboost.vcxproj]
  C:/Users/Laurae/AppData/Local/Temp/RtmpKkX2wK/xgboost/src/objective/regression_obj_gpu.cu(178): error : identifier "uint" is undefined [C:\Users\Laurae\AppData\Local\Temp\RtmpKkX2wK\xgboost\build\gpuxgboost.vcxproj]
  C:/Users/Laurae/AppData/Local/Temp/RtmpKkX2wK/xgboost/src/objective/regression_obj_gpu.cu(178): error : identifier "uint" is undefined [C:\Users\Laurae\AppData\Local\Temp\RtmpKkX2wK\xgboost\build\gpuxgboost.vcxproj]

Are there more specific prerequisistes for GPU compilation other than this?

RAMitchell commented 6 years ago

There is a known issue with cuda 8.0.61. See #2762.

Rolling back to version 8.0.44 or upgrading to cuda 9 may solve the problem.

Laurae2 commented 6 years ago

I can't install CUDA 9 due to compatibility issues with other R packages. Going to try CUDA 9 later.

Using 8.0.44 I am getting this error:

CustomBuild:
  Building NVCC (Device) object CMakeFiles/gpuxgboost.dir/src/common/Release/gpuxgboost_generated_host_device_vector.cu.obj
  host_device_vector.cu
  host_device_vector.cu
  Building NVCC (Device) object CMakeFiles/gpuxgboost.dir/src/objective/Release/gpuxgboost_generated_regression_obj_gpu.cu.obj
  regression_obj_gpu.cu
c:\users\laurae\appdata\local\temp\rtmpc8yroj\xgboost\dmlc-core\include\dmlc\./optional.h(26): warning : "constexpr" is ignored here in Microsoft mode [C:\Users\Laurae\AppData\Local\Temp\RtmpC8YROj\xgboost\build\gpuxgboost.vcxproj]

C:/Users/Laurae/AppData/Local/Temp/RtmpC8YROj/xgboost/src/objective/regression_obj_gpu.cu(46): error : identifier "uint" is undefined [C:\Users\Laurae\AppData\Local\Temp\RtmpC8YROj\xgboost\build\gpuxgboost.vcxproj]

C:/Users/Laurae/AppData/Local/Temp/RtmpC8YROj/xgboost/src/objective/regression_obj_gpu.cu(79): error : identifier "uint" is undefined [C:\Users\Laurae\AppData\Local\Temp\RtmpC8YROj\xgboost\build\gpuxgboost.vcxproj]

C:/Users/Laurae/AppData/Local/Temp/RtmpC8YROj/xgboost/src/objective/regression_obj_gpu.cu(178): error : identifier "uint" is undefined [C:\Users\Laurae\AppData\Local\Temp\RtmpC8YROj\xgboost\build\gpuxgboost.vcxproj]
            detected during:
              instantiation of "void xgboost::obj::GPURegLossObj<Loss>::GetGradientDevice(float *, const xgboost::MetaInfo &, int, xgboost::bst_gpair *, size_t) [with Loss=xgboost::obj::LinearSquareLoss]" 
  (130): here
              instantiation of "void xgboost::obj::GPURegLossObj<Loss>::GetGradient(const std::vector<xgboost::bst_float, std::allocator<xgboost::bst_float>> &, const xgboost::MetaInfo &, int, std::vector<xgboost::bst_gpair, std::allocator<xgboost::bst_gpair>> *) [with Loss=xgboost::obj::LinearSquareLoss]" 
  (111): here
              implicit generation of "xgboost::obj::GPURegLossObj<Loss>::~GPURegLossObj() [with Loss=xgboost::obj::LinearSquareLoss]" 
  (111): here
              instantiation of class "xgboost::obj::GPURegLossObj<Loss> [with Loss=xgboost::obj::LinearSquareLoss]" 
  (111): here
              instantiation of "xgboost::obj::GPURegLossObj<Loss>::GPURegLossObj() [with Loss=xgboost::obj::LinearSquareLoss]" 
  (225): here

C:/Users/Laurae/AppData/Local/Temp/RtmpC8YROj/xgboost/src/objective/regression_obj_gpu.cu(178): error : identifier "uint" is undefined [C:\Users\Laurae\AppData\Local\Temp\RtmpC8YROj\xgboost\build\gpuxgboost.vcxproj]
            detected during:
              instantiation of "void xgboost::obj::GPURegLossObj<Loss>::GetGradientDevice(float *, const xgboost::MetaInfo &, int, xgboost::bst_gpair *, size_t) [with Loss=xgboost::obj::LogisticRegression]" 
  (130): here
              instantiation of "void xgboost::obj::GPURegLossObj<Loss>::GetGradient(const std::vector<xgboost::bst_float, std::allocator<xgboost::bst_float>> &, const xgboost::MetaInfo &, int, std::vector<xgboost::bst_gpair, std::allocator<xgboost::bst_gpair>> *) [with Loss=xgboost::obj::LogisticRegression]" 
  (111): here
              implicit generation of "xgboost::obj::GPURegLossObj<Loss>::~GPURegLossObj() [with Loss=xgboost::obj::LogisticRegression]" 
  (111): here
              instantiation of class "xgboost::obj::GPURegLossObj<Loss> [with Loss=xgboost::obj::LogisticRegression]" 
  (111): here
              instantiation of "xgboost::obj::GPURegLossObj<Loss>::GPURegLossObj() [with Loss=xgboost::obj::LogisticRegression]" 
  (229): here

C:/Users/Laurae/AppData/Local/Temp/RtmpC8YROj/xgboost/src/objective/regression_obj_gpu.cu(178): error : identifier "uint" is undefined [C:\Users\Laurae\AppData\Local\Temp\RtmpC8YROj\xgboost\build\gpuxgboost.vcxproj]
            detected during:
              instantiation of "void xgboost::obj::GPURegLossObj<Loss>::GetGradientDevice(float *, const xgboost::MetaInfo &, int, xgboost::bst_gpair *, size_t) [with Loss=xgboost::obj::LogisticClassification]" 
  (130): here
              instantiation of "void xgboost::obj::GPURegLossObj<Loss>::GetGradient(const std::vector<xgboost::bst_float, std::allocator<xgboost::bst_float>> &, const xgboost::MetaInfo &, int, std::vector<xgboost::bst_gpair, std::allocator<xgboost::bst_gpair>> *) [with Loss=xgboost::obj::LogisticClassification]" 
  (111): here
              implicit generation of "xgboost::obj::GPURegLossObj<Loss>::~GPURegLossObj() [with Loss=xgboost::obj::LogisticClassification]" 
  (111): here
              instantiation of class "xgboost::obj::GPURegLossObj<Loss> [with Loss=xgboost::obj::LogisticClassification]" 
  (111): here
              instantiation of "xgboost::obj::GPURegLossObj<Loss>::GPURegLossObj() [with Loss=xgboost::obj::LogisticClassification]" 
  (233): here

C:/Users/Laurae/AppData/Local/Temp/RtmpC8YROj/xgboost/src/objective/regression_obj_gpu.cu(178): error : identifier "uint" is undefined [C:\Users\Laurae\AppData\Local\Temp\RtmpC8YROj\xgboost\build\gpuxgboost.vcxproj]
            detected during:
              instantiation of "void xgboost::obj::GPURegLossObj<Loss>::GetGradientDevice(float *, const xgboost::MetaInfo &, int, xgboost::bst_gpair *, size_t) [with Loss=xgboost::obj::LogisticRaw]" 
  (130): here
              instantiation of "void xgboost::obj::GPURegLossObj<Loss>::GetGradient(const std::vector<xgboost::bst_float, std::allocator<xgboost::bst_float>> &, const xgboost::MetaInfo &, int, std::vector<xgboost::bst_gpair, std::allocator<xgboost::bst_gpair>> *) [with Loss=xgboost::obj::LogisticRaw]" 
  (111): here
              implicit generation of "xgboost::obj::GPURegLossObj<Loss>::~GPURegLossObj() [with Loss=xgboost::obj::LogisticRaw]" 
  (111): here
              instantiation of class "xgboost::obj::GPURegLossObj<Loss> [with Loss=xgboost::obj::LogisticRaw]" 
  (111): here
              instantiation of "xgboost::obj::GPURegLossObj<Loss>::GPURegLossObj() [with Loss=xgboost::obj::LogisticRaw]" 
  (238): here

  6 errors detected in the compilation of "C:/Users/Laurae/AppData/Local/Temp/tmpxft_000003a8_00000000-19_regression_obj_gpu.compute_61.cpp1.ii".
  regression_obj_gpu.cu
  CMake Error at gpuxgboost_generated_regression_obj_gpu.cu.obj.Release.cmake:282 (message):
    Error generating file
    C:/Users/Laurae/AppData/Local/Temp/RtmpC8YROj/xgboost/build/CMakeFiles/gpuxgboost.dir/src/objective/Release/gpuxgboost_generated_regression_obj_gpu.cu.obj
kristang commented 6 years ago

The issue still persist when using Cuda 9 (for me at least). Branching before #2935 lets me compile with Cuda 9.1 targetting Visual Studio 14 2015 Win64.

Laurae2 commented 6 years ago

@kristang Very great finding!

Using xgbdl::xgb.dl(compiler = "Visual Studio 14 2015 Win64", commit = "a187ed6", use_gpu = TRUE, use_avx = TRUE) it compiles successfully in less than 5 minutes! At least I found a reliable way to get GPU working with a recent xgboost version (CUDA 8 + VS 2015).

RAMitchell commented 6 years ago

I have pushed a fix in #3051 fixing the visual studio build due to 'uint' being used.