apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

Compiling from Source with GPU Support Fails on Windows 10 #20256

Open lunar-walker opened 3 years ago

lunar-walker commented 3 years ago

Description

When compiling version 1.8 with GPU & MKLDNN support from Windows 10 / x64 (Visual studio Community 2019), it fails with lots of errors of pattern calling a host function from a global function is not allowed...

Initial investigation suggests some sort of confusion by CUDA Compiler for mixing standard C/C++ libraries functions with CUDA Libraries functions i.e. error is appearing when calling floor, ceil functions inside CUDA code (both functions are available in standard C/C++ & CUDA libraries).
I'm using CUDA 11.2 (Pasted below initial out of build_windows.py for environment)

Error Message

../src/operator/contrib/adaptive_avg_pooling.cu(73): error: calling a host function("floorf") from a global__ function("mxnet::op::adaptiveaveragepool ") is not allowed

../src/operator/contrib/adaptive_avg_pooling.cu(73): error: identifier "__floorf" is undefined in device code

../src/operator/contrib/adaptive_avg_pooling.cu(74): error: calling a host function("ceilf") from a global__ function("mxnet::op::adaptiveaveragepool ") is not allowed

../src/operator/contrib/adaptive_avg_pooling.cu(74): error: identifier "__ceilf" is undefined in device code

../src/operator/contrib/adaptive_avg_pooling.cu(78): error: calling a host function("floorf") from a global__ function("mxnet::op::adaptiveaveragepool ") is not allowed

../src/operator/contrib/adaptive_avg_pooling.cu(78): error: identifier "__floorf" is undefined in device code

../src/operator/contrib/adaptive_avg_pooling.cu(79): error: calling a host function("ceilf") from a global__ function("mxnet::op::adaptiveaveragepool ") is not allowed

../src/operator/contrib/adaptive_avg_pooling.cu(79): error: identifier "__ceilf" is undefined in device code

../src/operator/contrib/adaptive_avg_pooling.cu(73): error: calling a host function("floorf") from a global__ function("mxnet::op::adaptiveaveragepool ") is not allowed

../src/operator/contrib/adaptive_avg_pooling.cu(73): error: identifier "__floorf" is undefined in device code

../src/operator/contrib/adaptive_avg_pooling.cu(74): error: calling a host function("ceilf") from a global__ function("mxnet::op::adaptiveaveragepool ") is not allowed

../src/operator/contrib/adaptive_avg_pooling.cu(74): error: identifier "__ceilf" is undefined in device code

../src/operator/contrib/adaptive_avg_pooling.cu(78): error: calling a host function("floorf") from a global__ function("mxnet::op::adaptiveaveragepool ") is not allowed

../src/operator/contrib/adaptive_avg_pooling.cu(78): error: identifier "__floorf" is undefined in device code

../src/operator/contrib/adaptive_avg_pooling.cu(79): error: calling a host function("ceilf") from a global__ function("mxnet::op::adaptiveaveragepool ") is not allowed

../src/operator/contrib/adaptive_avg_pooling.cu(79): error: identifier "__ceilf" is undefined in device code

../src/operator/contrib/adaptive_avg_pooling.cu(73): error: calling a host function("floorf") from a global__ function("mxnet::op::adaptiveaveragepool< ::mshadow::half::half_t> ") is not allowed

../src/operator/contrib/adaptive_avg_pooling.cu(73): error: identifier "__floorf" is undefined in device code

../src/operator/contrib/adaptive_avg_pooling.cu(74): error: calling a host function("ceilf") from a global__ function("mxnet::op::adaptiveaveragepool< ::mshadow::half::half_t> ") is not allowed

../src/operator/contrib/adaptive_avg_pooling.cu(74): error: identifier "__ceilf" is undefined in device code

../src/operator/contrib/adaptive_avg_pooling.cu(78): error: calling a host function("floorf") from a global__ function("mxnet::op::adaptiveaveragepool< ::mshadow::half::half_t> ") is not allowed

../src/operator/contrib/adaptive_avg_pooling.cu(78): error: identifier "__floorf" is undefined in device code

../src/operator/contrib/adaptive_avg_pooling.cu(79): error: calling a host function("ceilf") from a global__ function("mxnet::op::adaptiveaveragepool< ::mshadow::half::half_t> ") is not allowed

../src/operator/contrib/adaptive_avg_pooling.cu(79): error: identifier "__ceilf" is undefined in device code

../src/operator/contrib/adaptive_avg_pooling.cu(130): error: calling a host function("floorf") from a global__ function("mxnet::op::atomicadaptiveaveragegradinput ") is not allowed

../src/operator/contrib/adaptive_avg_pooling.cu(130): error: identifier "__floorf" is undefined in device code

...... ...... ......

What have you tried to solve it?

  1. Tried compiling using cmake / visual Studio solution
  2. Tried compiling using ci/build_windows.py script

Environment

Environment Information ----------Python Info---------- Version : 3.9.5 Compiler : MSC v.1928 64 bit (AMD64) Build : ('tags/v3.9.5:0a7dcbd', 'May 3 2021 17:27:52') Arch : ('64bit', 'WindowsPE') ------------Pip Info----------- Version : 21.1.1 Directory : C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\site-packages\pip ----------MXNet Info----------- No MXNet installed. ----------System Info---------- Platform : Windows-10-10.0.19042-SP0 system : Windows node : DESKTOP-6LE30FJ release : 10 version : 10.0.19042 ----------Hardware Info---------- machine : AMD64 processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel Name Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz
Initial output of ci/build_windows.py ********************************************************************** ** Visual Studio 2019 Developer Command Prompt v16.9.4 ** Copyright (c) 2021 Microsoft Corporation ********************************************************************** [vcvarsall.bat] Environment initialized for: 'x64' -- The C compiler identification is MSVC 19.28.29914.0 -- The CXX compiler identification is MSVC 19.28.29914.0 -- Detecting C compiler ABI info -- Detecting C compiler ABI info - done -- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.28.29910/bin/Hostx64/x64/cl.exe - skipped -- Detecting C compile features -- Detecting C compile features - done -- Detecting CXX compiler ABI info -- Detecting CXX compiler ABI info - done -- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.28.29910/bin/Hostx64/x64/cl.exe - skipped -- Detecting CXX compile features -- Detecting CXX compile features - done -- CMAKE_CROSSCOMPILING FALSE -- CMAKE_HOST_SYSTEM_PROCESSOR AMD64 -- CMAKE_SYSTEM_PROCESSOR AMD64 -- CMAKE_SYSTEM_NAME Windows -- CMake version '3.20.1' using generator 'Ninja' -- Looking for a CUDA compiler -- Looking for a CUDA compiler - C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2/bin/nvcc.exe -- The CUDA compiler identification is NVIDIA 11.2.67 -- Detecting CUDA compiler ABI info -- Detecting CUDA compiler ABI info - done -- Check for working CUDA compiler: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2/bin/nvcc.exe - skipped -- Detecting CUDA compile features -- Detecting CUDA compile features - done -- The ASM_MASM compiler identification is MSVC -- Found assembler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Community/VC/Tools/MSVC/14.28.29910/bin/Hostx64/x64/ml64.exe CMake Deprecation Warning at 3rdparty/mkldnn/CMakeLists.txt:17 (cmake_minimum_required): Compatibility with CMake < 2.8.12 will be removed from a future version of CMake. Update the VERSION argument value or use a ... suffix to tell CMake that the project does not need compatibility with older versions. -- Intel MKL-DNN compat: set DNNL_BUILD_EXAMPLES to MKLDNN_BUILD_EXAMPLES with value `OFF` -- Intel MKL-DNN compat: set DNNL_BUILD_TESTS to MKLDNN_BUILD_TESTS with value `OFF` -- Intel MKL-DNN compat: set DNNL_ENABLE_JIT_PROFILING to MKLDNN_ENABLE_JIT_PROFILING with value `OFF` -- Intel MKL-DNN compat: set DNNL_LIBRARY_TYPE to MKLDNN_LIBRARY_TYPE with value `STATIC` -- Intel MKL-DNN compat: set DNNL_ARCH_OPT_FLAGS to MKLDNN_ARCH_OPT_FLAGS with value `` -- Looking for pthread.h -- Looking for pthread.h - not found -- Found Threads: TRUE -- Found OpenMP_C: -openmp (found version "2.0") -- Found OpenMP_CXX: -openmp (found version "2.0") -- Found OpenMP: TRUE (found version "2.0") -- GPU support is disabled -- Found Git: C:/Program Files/Git/cmd/git.exe (found version "2.30.0.windows.2") fatal: not a git repository (or any of the parent directories): .git -- Primitive cache is enabled -- Found MKL: D:/oneAPI/mkl/latest/include -- Found MKL (include: D:/oneAPI/mkl/latest/include, lib: D:/oneAPI/mkl/latest/lib/intel64/mkl_rt.lib -- Found OpenCV: D:/code/opencvlib/build_opencv (found version "4.5.2") found components: core highgui imgproc imgcodecs -- OpenCV 4.5.2 found (D:/code/opencvlib/build_opencv) -- OpenCV_LIBS=opencv_core;opencv_highgui;opencv_imgproc;opencv_imgcodecs USE_LAPACK is ON CMake Warning at 3rdparty/googletest/googletest/CMakeLists.txt:47 (project): VERSION keyword not followed by a value or was followed by a value that expanded to nothing. CMake Deprecation Warning at 3rdparty/googletest/googletest/CMakeLists.txt:49 (cmake_minimum_required): Compatibility with CMake < 2.8.12 will be removed from a future version of CMake. Update the VERSION argument value or use a ... suffix to tell CMake that the project does not need compatibility with older versions. -- Found PythonInterp: D:/miniconda3/python.exe (found version "3.7.9") -- Found GTest: gtest -- Found CUDNN: C:/cudnn/cuda/lib/x64/cudnn.lib -- Looking for clock_gettime in rt -- Looking for clock_gettime in rt - not found -- Looking for fopen64 -- Looking for fopen64 - not found -- Looking for C++ include cxxabi.h -- Looking for C++ include cxxabi.h - not found -- Looking for nanosleep -- Looking for nanosleep - not found -- Looking for backtrace -- Looking for backtrace - not found -- Could NOT find Backtrace (missing: Backtrace_LIBRARY Backtrace_INCLUDE_DIR) -- D:/code/apache-mxnet-src-1.8.0-incubating/3rdparty/dmlc-core/cmake/build_config.h.in -> include/dmlc/build_config.h -- CUDA: Using the following NVCC architecture flags -gencode;arch=compute_75,code=sm_75 -- Found CUDAToolkit: C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v11.2/include (found version "11.2.67") -- Performing Test SUPPORT_MSSE3 -- Performing Test SUPPORT_MSSE3 - Failed -- Performing Test SUPPORT_MSSE2 -- Performing Test SUPPORT_MSSE2 - Failed -- Determining F16C support F16C instruction set is not yet supported for MSVC -- CUDA: Adding NVCC options: --fatbin-options --compress-all -- Google Test not found -- Configuring done CMake Warning (dev) in CMakeLists.txt: Policy CMP0104 is not set: CMAKE_CUDA_ARCHITECTURES now detected for NVCC, empty CUDA_ARCHITECTURES not allowed. Run "cmake --help-policy CMP0104" for policy details. Use the cmake_policy command to set the policy and suppress this warning. CUDA_ARCHITECTURES is empty for target "mxnet_75". This warning is for project developers. Use -Wno-dev to suppress it. CMake Warning (dev) in CMakeLists.txt: Policy CMP0104 is not set: CMAKE_CUDA_ARCHITECTURES now detected for NVCC, empty CUDA_ARCHITECTURES not allowed. Run "cmake --help-policy CMP0104" for policy details. Use the cmake_policy command to set the policy and suppress this warning. CUDA_ARCHITECTURES is empty for target "customop_gpu_lib". This warning is for project developers. Use -Wno-dev to suppress it. -- Generating done -- Build files have been written to: D:/code/apache-mxnet-src-1.8.0-incubating/build
github-actions[bot] commented 3 years ago

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue. Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly. If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.

leezu commented 3 years ago

Which version of mxnet are you compiling?

lunar-walker commented 3 years ago

Version 1.8.0

leezu commented 3 years ago

You can try applying https://github.com/apache/incubator-mxnet/commit/3c1b3249507d680dffa740d510491331df96f8e0 to your version, or try compiling the latest v1.x / master branch.

lunar-walker commented 3 years ago

I have made some changes in the code by following these 2 links :-

https://trac.ffmpeg.org/ticket/9150 https://devtalk.blender.org/t/cuda-compile-error-windows-10/17886/4

and now using Visual Studio 2017 to build (currently in progress, will let you know results) If failed, i ll try compiling latest v1.x afterwards

lunar-walker commented 3 years ago

I was able to compile using VS 2017 without any code modification and able to get DLLs. Then, there was issue of building python bindings which were failing with access violation errors OSError: exception: access violation writing 0x0000000000000000

Placing all dependencies in libmxnet.dll folder resolved the issue. and I was able to install python egg. However, when I import mxnet, there is substantial delay and None is displayed as

'>>> import mxnet as mx' None '>>>'

Though I was able to run both CPU & GPU examples given at https://mxnet.apache.org/versions/1.8.0/get_started/validate_mxnet Is it normal? I ll try to run some inference on some models and will let you know results.

lunar-walker commented 3 years ago

Build seems working fine with InsightFace pretrained models and performance is akin to running on Ubuntu... Is there any test suit available to thoroughly check a GPU build? And Should I close this issue now?

Just for info for all others, here is the recap I managed to get Windows 10 x64 GPU Build of Version 1.8 by using VS 2017, CUDA 11.2 (Architecture 7.5), CUDNN, MKL, LAPACK & OpenCV by following:-

  1. Installed all required frameworks & libraries & VS 2017 (Ensure there is CUDA & MSVC integration, hint: install CUDA after VS)
  2. Use following command from x64 Native Command Prompt for Visual Studio 2017 'cmake -G "Visual Studio 15 2017 Win64" -T cuda=11.2,host=x64 -DCMAKE_BUILD_TYPE=Release -DUSE_CUDA=1 -DUSE_CUDNN=1 -DUSE_OPENCV=1 -DUSE_OPENMP=1 -DUSE_LAPACK=1 -DUSE_BLAS=mkl -DUSE_DIST_KVSTORE=0 -DCUDNN_INCLUDE=C:\cudnn\cuda\include -DCUDNN_LIBRARY=C:\cudnn\cuda\lib\x64\cudnn.lib -DMKL_ROOT="D:\oneAPI\mkl\latest" "d:\code\apache-mxnet-src-1.8.0-incubating" ' (Change appropriate paths as your configuration)
  3. Use VS 2017 GUI to build mxnet.sln, after successful build, you ll find libraries in Build\Release folder
  4. Copy all the dependencies DLLs as per your build parameters (i.e. CUDA, CUDNN, MKL, OpenCV, LAPACK etc) inside Release folder
  5. Download and extract https://sourceforge.net/projects/openblas/files/v0.2.12/mingw64_dll.zip/download in the Release Folder.
  6. Use Python\setup.py to install the Python bindings by invoking the script from inside python environment where you plan to install it.
  7. Hope & Pray that it works

Best Regards,