apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.8k forks source link

Windows GPU Build Conducted by build_windows.py Failed #20206

Closed sjiagc closed 3 years ago

sjiagc commented 3 years ago

Description

Windows GPU Build conducted by build_windows.py failed. The root cause is that _CONSTEXPR_IF is defined to nothing when compiling the C++ header random with CUDACC.

Error Message

>nvcc.exe -forward-unknown-to-host-compiler -DDMLC_CORE_USE_CMAKE -DDMLC_LOG_STACK_TRACE_SIZE=0 -DDMLC_MODERN_THREAD_LOCAL=0 -DDMLC_STRICT_CXX11 -DDMLC_USE_CXX11 -DDMLC_USE_CXX11=1 -DDMLC_USE_CXX14 -DMSHADOW_FORCE_STREAM -DMSHADOW_INT64_TENSOR_SIZE=1 -DMSHADOW_IN_CXX11 -DMSHADOW_USE_CBLAS=1 -DMSHADOW_USE_CUDA=1 -DMSHADOW_USE_CUDNN -DMSHADOW_USE_F16C=0 -DMSHADOW_USE_MKL=0 -DMSHADOW_USE_SSE=0 -DMXNET_EXPORTS -DMXNET_USE_BLAS_OPEN=1 -DMXNET_USE_CUDA=1 -DMXNET_USE_LAPACK=1 -DMXNET_USE_LIBJPEG_TURBO=0 -DMXNET_USE_OPENCV=1 -DMXNET_USE_OPENMP=1 -DMXNET_USE_SIGNAL_HANDLER=1 -DNNVM_EXPORTS -DNOMINMAX -DUSE_CUDNN -DWIN32_LEAN_AND_MEAN -D_CRT_SECURE_NO_WARNINGS -D_SCL_SECURE_NO_WARNINGS -D__USE_XOPEN2K8 -Dmxnet_61_EXPORTS -I..\..\include -I..\..\src -I..\..\3rdparty\tvm\nnvm\include -I..\..\3rdparty\tvm\include -I..\..\3rdparty\dmlc-core\include -I..\..\3rdparty\dlpack\include -I..\..\3rdparty\mshadow -I..\..\3rdparty\miniz -I3rdparty\dmlc-core\include -isystem=D:\develop\3rd-party\OpenBLAS\0.3.13\include -isystem=D:\develop\3rd-party\opencv\4.5.2\include -isystem=C:\tshen\tools\programming\nv\cuda\v11.3\include -D_WINDOWS -Xcompiler="/W3 /GR /EHsc" --fatbin-options --compress-all -Xcompiler="-MD -O2 -Ob2" -DNDEBUG --gpu-architecture=compute_61 --gpu-code=sm_61,compute_61 "-Xcompiler=-MD -Gy /bigobj" -std=c++14 -MD -MT CMakeFiles\mxnet_61.dir\src\ndarray\ndarray_function.cu.obj -MF CMakeFiles\mxnet_61.dir\src\ndarray\ndarray_function.cu.obj.d -x cu -c ..\..\src\ndarray\ndarray_function.cu -o CMakeFiles\mxnet_61.dir\src\ndarray\ndarray_function.cu.obj -Xcompiler=-FdCMakeFiles\mxnet_61.dir\,-FS
ndarray_function.cu

C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\include\random(2044): error: shift count is negative
          detected during:
            instantiation of "_Flt std::_Float_upper_bound<_Flt,_Ty>(_Ty) [with _Flt=double, _Ty=std::make_unsigned_t<int>]"
(2337): here
            instantiation of "std::poisson_distribution<_Ty>::result_type std::poisson_distribution<_Ty>::_Eval(_Engine &, const std::poisson_distribution<_Ty>::param_type &) const [with _Ty=int, _Engine=std::mt19937]"
(2305): here
            instantiation of "std::poisson_distribution<_Ty>::result_type std::poisson_distribution<_Ty>::operator()(_Engine &) const [with _Ty=int, _Engine=std::mt19937]"
d:\develop\oss\mxnet\3rdparty\mshadow\mshadow\./random.h(196): here
            instantiation of "void mshadow::Random<mshadow::cpu, DType>::SamplePoisson(mshadow::Tensor<mshadow::cpu, dim, DType> *, PType) [with DType=float, dim=2, PType=float]"
d:\develop\oss\mxnet\src\ndarray\./ndarray_function-inl.h(306): here

1 error detected in the compilation of "../../src/ndarray/ndarray_function.cu".

To Reproduce

In \path\to\mxnet\ci directory, run python build_windows.py -f WIN_GPU.

Steps to reproduce

(Paste the commands you ran that produced the error.)

  1. cd \path\to\mxnet\ci
  2. set OpenBLAS_HOME=\OpenBLAS\0.3.13
  3. set OpenCV_DIR=\opencv\4.5.2
  4. set CUDA_PATH=\cuda\v11.3
  5. python build_windows.py -f WIN_GPU

What have you tried to solve it?

  1. Created a small piece of sample code to narrow down the problem.
    
    // sampe.cu
    // It has to be a cu file and is compiled by nvcc

include

include

int main() { std::random_device rd; std::mt19937 gen(rd()); std::poisson_distribution d(4); std::cout << d(gen) << std::endl; return 0; }

Actually, it's kind of MSVC limitation.

2. I will create a PR to propose a fix.

## Environment

<details>
<summary>Environment Information</summary>

----------Python Info---------- Version : 3.9.4 Compiler : MSC v.1916 64 bit (AMD64) Build : ('default', 'Apr 9 2021 11:43:21') Arch : ('64bit', 'WindowsPE') ------------Pip Info----------- Version : 21.0.1 Directory : D:\develop\py-envs\mxnet\lib\site-packages\pip ----------MXNet Info----------- An error occured trying to import mxnet. This is very likely due to missing missing or incompatible library files. Traceback (most recent call last): File "D:\develop\oss\diagnose.py", line 96, in check_mxnet print('Version :', mxnet.version) AttributeError: module 'mxnet' has no attribute 'version'

----------System Info---------- Platform : Windows-10-10.0.19041-SP0 system : Windows node : sjiagc-laptop release : 10 version : 10.0.19041 ----------Hardware Info---------- machine : AMD64 processor : Intel64 Family 6 Model 158 Stepping 10, GenuineIntel Name Intel(R) Core(TM) i7-8750H CPU @ 2.20GHz

----------Network Test---------- Setting timeout: 10 Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.1067 sec, LOAD: 1.8341 sec. Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.3620 sec, LOAD: 0.4518 sec. Error open Gluon Tutorial(cn): https://zh.gluon.ai, <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1123)>, DNS finished in 0.6243276596069336 sec. Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.2793 sec, LOAD: 1.0372 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.1217 sec, LOAD: 0.9116 sec. Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: Forbidden, DNS finished in 0.0009951591491699219 sec. ----------Environment----------



</details>
github-actions[bot] commented 3 years ago

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue. Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly. If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.

sjiagc commented 3 years ago

Close due to PR apach#20207 was merged.