apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.79k forks source link

Cmake with NCCL flag does not work. #17239

Closed apeforest closed 4 years ago

apeforest commented 4 years ago

Description

If I build mxnet with NCCL using cmake, it failed with "Could not find NCCL libraries" even though my NCCL is installed at /usr/local/cuda/include

Reproduce

cmake -GNinja -DUSE_CUDA=ON -DCMAKE_CUDA_COMPILER_LAUNCHER=ccache -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache -DCMAKE_BUILD_TYPE=Release -DUSE_CUDNN=ON -DUSE_NCCL=ON ..

CMake Warning at CMakeLists.txt:299 (message):
  Could not find NCCL libraries

Environment

We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:

curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python

----------Python Info----------
Version      : 3.6.6
Compiler     : GCC 7.2.0
Build        : ('default', 'Jun 28 2018 17:14:51')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 19.3.1
Directory    : /home/ubuntu/anaconda3/lib/python3.6/site-packages/pip
----------MXNet Info-----------
No MXNet installed.
----------System Info----------
Platform     : Linux-4.4.0-1096-aws-x86_64-with-debian-stretch-sid
system       : Linux
node         : ip-172-31-20-50
release      : 4.4.0-1096-aws
version      : #107-Ubuntu SMP Thu Oct 3 01:51:58 UTC 2019
----------Hardware Info----------
machine      : x86_64
processor    : x86_64
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                64
On-line CPU(s) list:   0-63
Thread(s) per core:    2
Core(s) per socket:    16
Socket(s):             2
NUMA node(s):          2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping:              1
CPU MHz:               2699.984
CPU max MHz:           3000.0000
CPU min MHz:           1200.0000
BogoMIPS:              4600.11
Hypervisor vendor:     Xen
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              46080K
NUMA node0 CPU(s):     0-15,32-47
NUMA node1 CPU(s):     16-31,48-63
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq monitor est ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt ida
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0060 sec, LOAD: 0.5026 sec.
Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0011 sec, LOAD: 0.5116 sec.
Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.1051 sec, LOAD: 0.3917 sec.
Timing for D2L: http://d2l.ai, DNS: 0.0108 sec, LOAD: 0.2085 sec.
Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.1761 sec, LOAD: 0.1178 sec.
Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.1306 sec, LOAD: 0.1471 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0123 sec, LOAD: 0.4014 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0120 sec, LOAD: 0.0739 sec.
leezu commented 4 years ago

We may refactor https://github.com/apache/incubator-mxnet/blob/master/cmake/Modules/FindNCCL.cmake to improve autodetection. In the meantime see the variables used for searching. If you set one of them to your nccl base directory, it should find nccl successfully?

mjsML commented 4 years ago

I experienced this too ... try using -DUSE_NCCL=1 -DUSE_NCCL_PATH=/usr/local/cuda/include (or as @leezu your NCCL path)

apeforest commented 4 years ago

@mjsML Thanks, using that flag worked for me. @guanxinq or @ChaiBapchya interested in fixing FindNCCL.cmake as suggested? :)

ChaiBapchya commented 4 years ago

I took a look at this auto-detection issue.

To solve this particular case, I added a check for symlink (if UNIX) - https://github.com/ChaiBapchya/incubator-mxnet/blob/nccl_autodetect/cmake/Modules/FindNCCL.cmake

If this is enough, I can submit a PR.

However, I'm not sure if it is complete. Coz I took a look at https://github.com/apache/incubator-mxnet/blob/master/cmake/Modules/FindCUDAToolkit.cmake It has a fairly long drawn way of finding the Cuda Toolkit

  1. Language / user provided path
  2. If cuda_root cmake/env not specified, check
    • check symlink
    • check platform default

Is this what's needed? @leezu @apeforest In that case it makes sense to "factor" out this check as it will be used at 2 places (findCudatoolkit and findNCCL)

leezu commented 4 years ago

@apeforest could you provide some background if NCCL is installed at /usr/local/cuda/include by default?

@ChaiBapchya your change seems to rely on CUDA_TOOLKIT_ROOT_DIR, but this variable is not among the variables exported by FindCUDAToolkit. In fact, you can see it's explicitly unset:

https://github.com/apache/incubator-mxnet/blob/master/cmake/Modules/FindCUDAToolkit.cmake#L708

Instead, let's use the result variables

https://github.com/apache/incubator-mxnet/blob/28e053edb4f2079743458bf087557bcac7e58c62/cmake/Modules/FindCUDAToolkit.cmake#L427-L464

Specifically CUDAToolkit_INCLUDE_DIRS and CUDAToolkit_LIBRARY_DIR? Or would the nccl library not be at the CUDAToolkit_LIBRARY_DIR?

Besides using the CUDAToolkit variables as additional defaults to find nccl, the NCCL_ROOT variable needs to be examined as per https://cmake.org/cmake/help/latest/policy/CMP0074.html (which is done correctly currently I think)

apeforest commented 4 years ago

In DLAMI, nccl is installed by default in the cuda directory: /usr/local/cuda/include/nccl.h

However, if user installed nccl manually by themselves, sudo apt install libnccl2 libnccl-dev, you may use the sudo dpkg-query -L libnccl-dev to find where it is. https://askubuntu.com/questions/1134732/where-is-nccl-h

I would suggest @ChaiBapchya to first search /usr/local/cuda/include/. If not found, try sudo dpkg-query -L libnccl-dev instead. Would that work?

apeforest commented 4 years ago

Thanks @ChaiBapchya for volunteering to work on this!

leezu commented 4 years ago

If not found, try sudo dpkg-query -L libnccl-dev instead.

That's would only work on Debian based platforms and only for one particular way of installing nccl on these systems. I think it's safe to require users to set NCCL_ROOT if they manually installed nccl to a different path.

To improve the user experience, we may fall-back to building nccl ourselves if nccl is required and not found. Pytorch does that for example.

ChaiBapchya commented 4 years ago

Ya. Even when I looked at different autodetection files for cmake used in various other open-source frameworks

  1. Xgboost - https://github.com/dmlc/xgboost/blob/master/cmake/modules/FindNccl.cmake
  2. Flashlight - https://github.com/facebookresearch/flashlight/blob/master/cmake/FindNCCL.cmake
  3. Pytorch - https://github.com/pytorch/pytorch/blob/master/cmake/Modules/FindNCCL.cmake
  4. Caffe - https://github.com/BVLC/caffe/blob/master/cmake/Modules/FindNCCL.cmake
  5. Thunder - https://github.com/thuem/THUNDER/blob/master/cmake/FindNCCL.cmake

They have similar approach. Either look for default path, env var (NCCL_ROOT) or /usr/local/cuda

Agree with @leezu I haven't seen "dpkg-query" or equivalent "find" commands used in cmake. They are more of command line searches. In cmake, there's find_path, find_library which does similar job.

Thanks @apeforest @leezu for chiming in!

leezu commented 4 years ago

@ChaiBapchya BTW, unfortunately a lot of CMake usage out in the wild does not meet the modern CMake bar but is leftover from the early days of CMake. While not covering all use-cases of MXNet, sometimes we can refer to https://cliutils.gitlab.io/modern-cmake/ for best practices