Mellanox / nv_peer_memory

305 stars 61 forks source link

Why it doesn't show connection via NET/IB/0/GDRDMA #56

Closed vilmara closed 4 years ago

vilmara commented 4 years ago

Environment:

  1. Framework: TensorFlow
  2. Framework version: TF 1.4
  3. Horovod version: 0.18.2 via Horovod in docker
  4. MPI version: 4.0.0
  5. CUDA version: 10.0
  6. NCCL version: .4.7-1
  7. Python version: 2.7
  8. OS and version: Ubuntu 18.06
  9. GCC version: 4.8
  10. Mellanox OFED 4.7.1
  11. GPUDirect RDMA - nvidia-peer-memory_1.0-8

Your question: I am running the TF benchmarks in multi-node mode with the latest version of Horovod via docker but I am not seeing the output connection via NET/IB/0/GDRDMA , see below the trace log

Tracelog master_node:20:289 [0] NCCL INFO NET/Socket : Using [0]ib0:192.168.11.1<0> master_node:20:289 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). master_node:20:289 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. master_node:20:289 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:192.168.11.1<0> NCCL version 2.4.7+cuda10.0 master_node:22:295 [2] NCCL INFO NET/Socket : Using [0]ib0:192.168.11.1<0> master_node:22:295 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). master_node:21:290 [1] NCCL INFO NET/Socket : Using [0]ib0:192.168.11.1<0> master_node:23:288 [3] NCCL INFO NET/Socket : Using [0]ib0:192.168.11.1<0> master_node:21:290 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). master_node:23:288 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). master_node:22:295 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0. master_node:21:290 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0. master_node:23:288 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0. secondary_node:44:311 [3] NCCL INFO NET/Socket : Using [0]ib0:192.168.11.2<0> secondary_node:41:312 [0] NCCL INFO NET/Socket : Using [0]ib0:192.168.11.2<0> secondary_node:42:310 [1] NCCL INFO NET/Socket : Using [0]ib0:192.168.11.2<0> secondary_node:43:309 [2] NCCL INFO NET/Socket : Using [0]ib0:192.168.11.2<0> secondary_node:42:310 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). secondary_node:43:309 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). secondary_node:44:311 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). secondary_node:41:312 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so). secondary_node:43:309 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0. secondary_node:44:311 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0. secondary_node:42:310 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0. secondary_node:41:312 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0. master_node:22:295 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:192.168.11.1<0> master_node:23:288 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:192.168.11.1<0> master_node:21:290 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:192.168.11.1<0> secondary_node:43:309 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:192.168.11.2<0> secondary_node:44:311 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:192.168.11.2<0> secondary_node:41:312 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:192.168.11.2<0> secondary_node:42:310 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:192.168.11.2<0> master_node:20:289 [0] NCCL INFO Setting affinity for GPU 0 to 5555,55555555,55555555 master_node:23:288 [3] NCCL INFO Setting affinity for GPU 3 to aaaa,aaaaaaaa,aaaaaaaa master_node:21:290 [1] NCCL INFO Setting affinity for GPU 1 to 5555,55555555,55555555 master_node:22:295 [2] NCCL INFO Setting affinity for GPU 2 to aaaa,aaaaaaaa,aaaaaaaa secondary_node:44:311 [3] NCCL INFO Setting affinity for GPU 3 to aaaa,aaaaaaaa,aaaaaaaa secondary_node:43:309 [2] NCCL INFO Setting affinity for GPU 2 to aaaa,aaaaaaaa,aaaaaaaa secondary_node:41:312 [0] NCCL INFO Setting affinity for GPU 0 to 5555,55555555,55555555 secondary_node:42:310 [1] NCCL INFO Setting affinity for GPU 1 to 5555,55555555,55555555 secondary_node:41:312 [0] NCCL INFO CUDA Dev 0[0], IB NIC distance : SYS secondary_node:44:311 [3] NCCL INFO CUDA Dev 3[3], IB NIC distance : NODE secondary_node:42:310 [1] NCCL INFO CUDA Dev 1[1], IB NIC distance : SYS secondary_node:43:309 [2] NCCL INFO CUDA Dev 2[2], IB NIC distance : NODE master_node:22:295 [2] NCCL INFO CUDA Dev 2[2], IB NIC distance : NODE master_node:23:288 [3] NCCL INFO CUDA Dev 3[3], IB NIC distance : NODE master_node:21:290 [1] NCCL INFO CUDA Dev 1[1], IB NIC distance : SYS master_node:20:289 [0] NCCL INFO CUDA Dev 0[0], IB NIC distance : SYS master_node:20:289 [0] NCCL INFO Channel 00 : 0 1 3 6 4 5 7 2 master_node:20:289 [0] NCCL INFO Channel 01 : 0 1 3 6 4 5 7 2 master_node:22:295 [2] NCCL INFO Ring 00 : 7 -> 2 [receive] via NET/IB/0 master_node:22:295 [2] NCCL INFO Ring 00 : 2[2] -> 0[0] via P2P/IPC secondary_node:43:309 [2] NCCL INFO Ring 00 : 3 -> 6 [receive] via NET/IB/0 master_node:21:290 [1] NCCL INFO Ring 00 : 1[1] -> 3[3] via P2P/IPC master_node:20:289 [0] NCCL INFO Ring 00 : 0[0] -> 1[1] via P2P/IPC master_node:23:288 [3] NCCL INFO Ring 00 : 3 -> 6 [send] via NET/IB/0 master_node:23:288 [3] NCCL INFO Ring 00 : 3[3] -> 1[1] via P2P/IPC secondary_node:43:309 [2] NCCL INFO Ring 00 : 6[2] -> 4[0] via P2P/IPC master_node:21:290 [1] NCCL INFO Ring 00 : 1[1] -> 0[0] via P2P/IPC master_node:20:289 [0] NCCL INFO Ring 00 : 0[0] -> 2[2] via P2P/IPC master_node:21:290 [1] NCCL INFO Ring 01 : 1[1] -> 3[3] via P2P/IPC master_node:23:288 [3] NCCL INFO Ring 01 : 3 -> 6 [send] via NET/IB/0 secondary_node:42:310 [1] NCCL INFO Ring 00 : 5[1] -> 7[3] via P2P/IPC secondary_node:41:312 [0] NCCL INFO Ring 00 : 4[0] -> 5[1] via P2P/IPC secondary_node:44:311 [3] NCCL INFO Ring 00 : 7 -> 2 [send] via NET/IB/0 master_node:22:295 [2] NCCL INFO Ring 00 : 6 -> 2 [receive] via NET/IB/0 master_node:20:289 [0] NCCL INFO Ring 01 : 0[0] -> 1[1] via P2P/IPC master_node:21:290 [1] NCCL INFO Ring 01 : 1[1] -> 0[0] via P2P/IPC secondary_node:44:311 [3] NCCL INFO Ring 00 : 7[3] -> 5[1] via P2P/IPC secondary_node:43:309 [2] NCCL INFO Ring 00 : 6 -> 2 [send] via NET/IB/0 secondary_node:42:310 [1] NCCL INFO Ring 00 : 5[1] -> 4[0] via P2P/IPC secondary_node:41:312 [0] NCCL INFO Ring 00 : 4[0] -> 6[2] via P2P/IPC secondary_node:43:309 [2] NCCL INFO Ring 00 : 2 -> 6 [receive] via NET/IB/0 master_node:22:295 [2] NCCL INFO Ring 00 : 2 -> 6 [send] via NET/IB/0 master_node:22:295 [2] NCCL INFO Ring 01 : 7 -> 2 [receive] via NET/IB/0 master_node:22:295 [2] NCCL INFO Ring 01 : 2[2] -> 0[0] via P2P/IPC secondary_node:43:309 [2] NCCL INFO Ring 01 : 3 -> 6 [receive] via NET/IB/0 master_node:23:288 [3] NCCL INFO Ring 01 : 3[3] -> 1[1] via P2P/IPC master_node:21:290 [1] NCCL INFO Trees [0] 0->1->3/-1/-1 [1] 0->1->3/-1/-1 secondary_node:44:311 [3] NCCL INFO Ring 01 : 7 -> 2 [send] via NET/IB/0 master_node:23:288 [3] NCCL INFO Trees [0] 1->3->-1/-1/-1 [1] 1->3->-1/-1/-1 master_node:20:289 [0] NCCL INFO Ring 01 : 0[0] -> 2[2] via P2P/IPC secondary_node:43:309 [2] NCCL INFO Ring 01 : 6[2] -> 4[0] via P2P/IPC master_node:21:290 [1] NCCL INFO comm 0x7f4d6839f060 rank 1 nranks 8 cudaDev 1 nvmlDev 1 - Init COMPLETE master_node:23:288 [3] NCCL INFO comm 0x7f48503a3650 rank 3 nranks 8 cudaDev 3 nvmlDev 3 - Init COMPLETE master_node:20:289 [0] NCCL INFO Trees [0] 2->0->1/-1/-1 [1] 2->0->1/-1/-1 master_node:20:289 [0] NCCL INFO Using 256 threads, Min Comp Cap 7, Trees enabled for all sizes secondary_node:42:310 [1] NCCL INFO Ring 01 : 5[1] -> 7[3] via P2P/IPC secondary_node:41:312 [0] NCCL INFO Ring 01 : 4[0] -> 5[1] via P2P/IPC master_node:20:289 [0] NCCL INFO comm 0x7f5450362840 rank 0 nranks 8 cudaDev 0 nvmlDev 0 - Init COMPLETE master_node:22:295 [2] NCCL INFO Ring 01 : 2 -> 6 [send] via NET/IB/0 secondary_node:44:311 [3] NCCL INFO Ring 01 : 7[3] -> 5[1] via P2P/IPC secondary_node:43:309 [2] NCCL INFO Ring 01 : 2 -> 6 [receive] via NET/IB/0 secondary_node:44:311 [3] NCCL INFO Trees [0] 5->7->-1/-1/-1 [1] 5->7->-1/-1/-1 master_node:22:295 [2] NCCL INFO Ring 01 : 6 -> 2 [receive] via NET/IB/0 secondary_node:42:310 [1] NCCL INFO Ring 01 : 5[1] -> 4[0] via P2P/IPC secondary_node:41:312 [0] NCCL INFO Ring 01 : 4[0] -> 6[2] via P2P/IPC secondary_node:44:311 [3] NCCL INFO comm 0x7ff2c43f7c00 rank 7 nranks 8 cudaDev 3 nvmlDev 3 - Init COMPLETE secondary_node:42:310 [1] NCCL INFO Trees [0] 4->5->7/-1/-1 [1] 4->5->7/-1/-1 secondary_node:41:312 [0] NCCL INFO Trees [0] 6->4->5/-1/-1 [1] 6->4->5/-1/-1 secondary_node:41:312 [0] NCCL INFO comm 0x7fd8dc3c6740 rank 4 nranks 8 cudaDev 0 nvmlDev 0 - Init COMPLETE secondary_node:43:309 [2] NCCL INFO Ring 01 : 6 -> 2 [send] via NET/IB/0 secondary_node:43:309 [2] NCCL INFO Trees [0] 2->6->4/-1/-1 [1] -1->6->4/2/-1 secondary_node:42:310 [1] NCCL INFO comm 0x7fa7cc422c90 rank 5 nranks 8 cudaDev 1 nvmlDev 1 - Init COMPLETE secondary_node:43:309 [2] NCCL INFO comm 0x7fce9c438c90 rank 6 nranks 8 cudaDev 2 nvmlDev 2 - Init COMPLETE master_node:22:295 [2] NCCL INFO Trees [0] -1->2->0/6/-1 [1] 6->2->0/-1/-1 master_node:22:295 [2] NCCL INFO comm 0x7fd8f038f460 rank 2 nranks 8 cudaDev 2 nvmlDev 2 - Init COMPLETE master_node:20:289 [0] NCCL INFO Launch mode Parallel

vilmara commented 4 years ago

Hi @haggaie, could you please take a look at this issue?. Thanks!

haggaie commented 4 years ago

Hi, I'm not familiar with this environment. I'll try to see if I can find someone who is.

vtlrazin commented 4 years ago

Hello Vilmara,

In order enable RDMA in docker you need:

  1. Deploy in your setup https://github.com/Mellanox/k8s-rdma-sriov-dev-plugin in shared HCA mode
  2. Enable IB virtualization on InfiniBand switch with command
    
    Mellanox MLNX-OS Switch Management

switch login: admin Password:

swx-mld-s01 [standalone: master] > enable swx-mld-s01 [standalone: master] # configure terminal swx-mld-s01 [standalone: master] (config) # ib smnode swx-mld-s01 enable swx-mld-s01 [standalone: master] (config) # ib smnode swx-mld-s01 sm-priority 0

swx-mld-s01 [standalone: master] (config) # ib sm virt enable swx-mld-s01 [standalone: master] (config) # write memory swx-mld-s01 [standalone: master] (config) # reload


3. Add RDMA resource in POD configuration like:
      containers:
      - image: image-name
        name: tensorflow-benchmarks
        securityContext:
          capabilities:
            add: [ "IPC_LOCK" ]               
        resources:
          limits:
            nvidia.com/gpu: 4
            rdma/hca: 1
        env:
        - name: NCCL_IB_DISABLE
          value: "0"
        - name: NCCL_NET_GDR_LEVEL
          value: "1"
        - name: NCCL_DEBUG_SUBSYS
          value: "NET"      


Please let me know if you have additional questions.

Best regards,
Vitaliy
vtlrazin commented 4 years ago

In additional if you want to see advantages of GPUDirect you must use servers with PCIe Root Complex!

vilmara commented 4 years ago

@vtlrazin, thanks for your prompt reply, currently I am not using Kubernetes for deployment. In the past, I was able to configure the multinode system over InfiniBand having GPUDirect RDMA installed on the localhosts only, and MLNX_OFED installed on the localhosts and in docker; the connection was shown as NET/IB/0/GDRDMA. Here are the flags used with mpirun:

-x NCCL_IB_DISABLE=0 -x NCCL_IB_CUDA_SUPPORT=1 -mca btl_tcp_if_include ib0 -x NCCL_SOCKET_IFNAME=ib0 -x NCCL_DEBUG=INFO --bind-to none --map-by slot --mca plm_rsh_args

Some suggestions?

vilmara commented 4 years ago

@vtlrazin I ran the test again adding the flag -x NCCL_DEBUG_SUBSYS=NET and it said GPU Direct RDMA Disabled for GPU X[X] / HCA 0, some suggestions on how to get GPU Direct RDMA enabled (not for Kubernets env)? see below the tracelog:

master_node:11715:11984 [0] NCCL INFO NET/Socket : Using [0]ib0:192.168.11.1<0>
master_node:11715:11984 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
master_node:11715:11984 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
master_node:11715:11984 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:192.168.11.1<0>
NCCL version 2.4.7+cuda10.0
master_node:11717:11986 [2] NCCL INFO NET/Socket : Using [0]ib0:192.168.11.1<0>
master_node:11717:11986 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
master_node:11718:11985 [3] NCCL INFO NET/Socket : Using [0]ib0:192.168.11.1<0>
master_node:11718:11985 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
master_node:11716:11983 [1] NCCL INFO NET/Socket : Using [0]ib0:192.168.11.1<0>
master_node:11716:11983 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
master_node:11717:11986 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
master_node:11716:11983 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
master_node:11718:11985 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
secondary_node:11791:12057 [2] NCCL INFO NET/Socket : Using [0]ib0:192.168.11.2<0>
secondary_node:11791:12057 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
secondary_node:11792:12059 [3] NCCL INFO NET/Socket : Using [0]ib0:192.168.11.2<0>
secondary_node:11792:12059 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
secondary_node:11790:12058 [1] NCCL INFO NET/Socket : Using [0]ib0:192.168.11.2<0>
secondary_node:11790:12058 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
secondary_node:11789:12066 [0] NCCL INFO NET/Socket : Using [0]ib0:192.168.11.2<0>
secondary_node:11789:12066 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so).
secondary_node:11791:12057 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
secondary_node:11792:12059 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
secondary_node:11790:12058 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
secondary_node:11789:12066 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
master_node:11717:11986 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:192.168.11.1<0>
master_node:11716:11983 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:192.168.11.1<0>
master_node:11718:11985 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:192.168.11.1<0>
secondary_node:11790:12058 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:192.168.11.2<0>
secondary_node:11791:12057 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:192.168.11.2<0>
secondary_node:11792:12059 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:192.168.11.2<0>
secondary_node:11789:12066 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:192.168.11.2<0>
master_node:11715:11984 [0] NCCL INFO Setting affinity for GPU 0 to 5555,55555555,55555555
master_node:11717:11986 [2] NCCL INFO Setting affinity for GPU 2 to aaaa,aaaaaaaa,aaaaaaaa
master_node:11716:11983 [1] NCCL INFO Setting affinity for GPU 1 to 5555,55555555,55555555
master_node:11718:11985 [3] NCCL INFO Setting affinity for GPU 3 to aaaa,aaaaaaaa,aaaaaaaa
secondary_node:11792:12059 [3] NCCL INFO Setting affinity for GPU 3 to aaaa,aaaaaaaa,aaaaaaaa
secondary_node:11791:12057 [2] NCCL INFO Setting affinity for GPU 2 to aaaa,aaaaaaaa,aaaaaaaa
secondary_node:11789:12066 [0] NCCL INFO Setting affinity for GPU 0 to 5555,55555555,55555555
secondary_node:11790:12058 [1] NCCL INFO Setting affinity for GPU 1 to 5555,55555555,55555555
secondary_node:11790:12058 [1] NCCL INFO CUDA Dev 1[1], IB NIC distance :  SYS
secondary_node:11792:12059 [3] NCCL INFO CUDA Dev 3[3], IB NIC distance :  NODE
secondary_node:11791:12057 [2] NCCL INFO CUDA Dev 2[2], IB NIC distance :  NODE
secondary_node:11789:12066 [0] NCCL INFO CUDA Dev 0[0], IB NIC distance :  SYS
master_node:11717:11986 [2] NCCL INFO CUDA Dev 2[2], IB NIC distance :  NODE
master_node:11716:11983 [1] NCCL INFO CUDA Dev 1[1], IB NIC distance :  SYS
master_node:11715:11984 [0] NCCL INFO CUDA Dev 0[0], IB NIC distance :  SYS
master_node:11718:11985 [3] NCCL INFO CUDA Dev 3[3], IB NIC distance :  NODE
master_node:11715:11984 [0] NCCL INFO Channel 00 :    0   1   3   6   4   5   7   2
master_node:11715:11984 [0] NCCL INFO Channel 01 :    0   1   3   6   4   5   7   2
master_node:11717:11986 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 3 >= 2)
secondary_node:11791:12057 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 3 >= 2)
master_node:11717:11986 [2] NCCL INFO Ring 00 : 7 -> 2 [receive] via NET/IB/0
secondary_node:11791:12057 [2] NCCL INFO Ring 00 : 3 -> 6 [receive] via NET/IB/0
master_node:11717:11986 [2] NCCL INFO Ring 00 : 2[2] -> 0[0] via P2P/IPC
secondary_node:11791:12057 [2] NCCL INFO Ring 00 : 6[2] -> 4[0] via P2P/IPC
master_node:11715:11984 [0] NCCL INFO Ring 00 : 0[0] -> 1[1] via P2P/IPC
secondary_node:11790:12058 [1] NCCL INFO Ring 00 : 5[1] -> 7[3] via P2P/IPC
secondary_node:11789:12066 [0] NCCL INFO Ring 00 : 4[0] -> 5[1] via P2P/IPC
master_node:11716:11983 [1] NCCL INFO Ring 00 : 1[1] -> 3[3] via P2P/IPC
master_node:11718:11985 [3] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 3[3] / HCA 0 (distance 3 >= 2)
secondary_node:11792:12059 [3] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 3[3] / HCA 0 (distance 3 >= 2)
master_node:11718:11985 [3] NCCL INFO Ring 00 : 3 -> 6 [send] via NET/IB/0
secondary_node:11792:12059 [3] NCCL INFO Ring 00 : 7 -> 2 [send] via NET/IB/0
master_node:11718:11985 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 303 mtu 5 LID 1
secondary_node:11792:12059 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 303 mtu 5 LID 4
master_node:11718:11985 [3] NCCL INFO Ring 00 : 3[3] -> 1[1] via P2P/IPC
secondary_node:11792:12059 [3] NCCL INFO Ring 00 : 7[3] -> 5[1] via P2P/IPC
master_node:11717:11986 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 3 >= 2)
master_node:11717:11986 [2] NCCL INFO Ring 00 : 6 -> 2 [receive] via NET/IB/0
master_node:11715:11984 [0] NCCL INFO Ring 00 : 0[0] -> 2[2] via P2P/IPC
master_node:11716:11983 [1] NCCL INFO Ring 00 : 1[1] -> 0[0] via P2P/IPC
secondary_node:11790:12058 [1] NCCL INFO Ring 00 : 5[1] -> 4[0] via P2P/IPC
secondary_node:11789:12066 [0] NCCL INFO Ring 00 : 4[0] -> 6[2] via P2P/IPC
secondary_node:11791:12057 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 3 >= 2)
secondary_node:11791:12057 [2] NCCL INFO Ring 00 : 6 -> 2 [send] via NET/IB/0
secondary_node:11791:12057 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 305 mtu 5 LID 4
secondary_node:11791:12057 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 3 >= 2)
master_node:11718:11985 [3] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 3[3] / HCA 0 (distance 3 >= 2)
secondary_node:11791:12057 [2] NCCL INFO Ring 00 : 2 -> 6 [receive] via NET/IB/0
master_node:11718:11985 [3] NCCL INFO Ring 01 : 3 -> 6 [send] via NET/IB/0
secondary_node:11792:12059 [3] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 3[3] / HCA 0 (distance 3 >= 2)
master_node:11715:11984 [0] NCCL INFO Ring 01 : 0[0] -> 1[1] via P2P/IPC
master_node:11716:11983 [1] NCCL INFO Ring 01 : 1[1] -> 3[3] via P2P/IPC
secondary_node:11792:12059 [3] NCCL INFO Ring 01 : 7 -> 2 [send] via NET/IB/0
master_node:11717:11986 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 3 >= 2)
secondary_node:11790:12058 [1] NCCL INFO Ring 01 : 5[1] -> 7[3] via P2P/IPC
secondary_node:11789:12066 [0] NCCL INFO Ring 01 : 4[0] -> 5[1] via P2P/IPC
master_node:11717:11986 [2] NCCL INFO Ring 00 : 2 -> 6 [send] via NET/IB/0
master_node:11717:11986 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 306 mtu 5 LID 1
master_node:11717:11986 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 3 >= 2)
secondary_node:11791:12057 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 3 >= 2)
master_node:11717:11986 [2] NCCL INFO Ring 01 : 7 -> 2 [receive] via NET/IB/0
master_node:11716:11983 [1] NCCL INFO Ring 01 : 1[1] -> 0[0] via P2P/IPC
secondary_node:11792:12059 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 307 mtu 5 LID 4
secondary_node:11791:12057 [2] NCCL INFO Ring 01 : 3 -> 6 [receive] via NET/IB/0
master_node:11718:11985 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 307 mtu 5 LID 1
master_node:11717:11986 [2] NCCL INFO Ring 01 : 2[2] -> 0[0] via P2P/IPC
secondary_node:11790:12058 [1] NCCL INFO Ring 01 : 5[1] -> 4[0] via P2P/IPC
secondary_node:11792:12059 [3] NCCL INFO Ring 01 : 7[3] -> 5[1] via P2P/IPC
secondary_node:11791:12057 [2] NCCL INFO Ring 01 : 6[2] -> 4[0] via P2P/IPC
secondary_node:11792:12059 [3] NCCL INFO Trees [0] 5->7->-1/-1/-1 [1] 5->7->-1/-1/-1
master_node:11718:11985 [3] NCCL INFO Ring 01 : 3[3] -> 1[1] via P2P/IPC
master_node:11718:11985 [3] NCCL INFO Trees [0] 1->3->-1/-1/-1 [1] 1->3->-1/-1/-1
secondary_node:11792:12059 [3] NCCL INFO comm 0x7f0bfc41f380 rank 7 nranks 8 cudaDev 3 nvmlDev 3 - Init COMPLETE
master_node:11716:11983 [1] NCCL INFO Trees [0] 0->1->3/-1/-1 [1] 0->1->3/-1/-1
secondary_node:11790:12058 [1] NCCL INFO Trees [0] 4->5->7/-1/-1 [1] 4->5->7/-1/-1
master_node:11718:11985 [3] NCCL INFO comm 0x7fdf243f75f0 rank 3 nranks 8 cudaDev 3 nvmlDev 3 - Init COMPLETE
secondary_node:11791:12057 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 3 >= 2)
secondary_node:11790:12058 [1] NCCL INFO comm 0x7fc910551550 rank 5 nranks 8 cudaDev 1 nvmlDev 1 - Init COMPLETE
secondary_node:11789:12066 [0] NCCL INFO Ring 01 : 4[0] -> 6[2] via P2P/IPC
secondary_node:11791:12057 [2] NCCL INFO Ring 01 : 2 -> 6 [receive] via NET/IB/0
secondary_node:11789:12066 [0] NCCL INFO Trees [0] 6->4->5/-1/-1 [1] 6->4->5/-1/-1
master_node:11716:11983 [1] NCCL INFO comm 0x7f1a2c397660 rank 1 nranks 8 cudaDev 1 nvmlDev 1 - Init COMPLETE
master_node:11715:11984 [0] NCCL INFO Ring 01 : 0[0] -> 2[2] via P2P/IPC
secondary_node:11789:12066 [0] NCCL INFO comm 0x7fbb4041aff0 rank 4 nranks 8 cudaDev 0 nvmlDev 0 - Init COMPLETE
master_node:11717:11986 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 3 >= 2)
master_node:11715:11984 [0] NCCL INFO Trees [0] 2->0->1/-1/-1 [1] 2->0->1/-1/-1
master_node:11715:11984 [0] NCCL INFO Using 256 threads, Min Comp Cap 7, Trees enabled for all sizes
master_node:11717:11986 [2] NCCL INFO Ring 01 : 2 -> 6 [send] via NET/IB/0
master_node:11715:11984 [0] NCCL INFO comm 0x7f5148397de0 rank 0 nranks 8 cudaDev 0 nvmlDev 0 - Init COMPLETE
master_node:11717:11986 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 309 mtu 5 LID 1
master_node:11717:11986 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 3 >= 2)
master_node:11717:11986 [2] NCCL INFO Ring 01 : 6 -> 2 [receive] via NET/IB/0
secondary_node:11791:12057 [2] NCCL INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 3 >= 2)
secondary_node:11791:12057 [2] NCCL INFO Ring 01 : 6 -> 2 [send] via NET/IB/0
secondary_node:11791:12057 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 310 mtu 5 LID 4
secondary_node:11791:12057 [2] NCCL INFO Trees [0] 2->6->4/-1/-1 [1] -1->6->4/2/-1
master_node:11717:11986 [2] NCCL INFO Trees [0] -1->2->0/6/-1 [1] 6->2->0/-1/-1
secondary_node:11791:12057 [2] NCCL INFO comm 0x7f8fa4551ef0 rank 6 nranks 8 cudaDev 2 nvmlDev 2 - Init COMPLETE
master_node:11717:11986 [2] NCCL INFO comm 0x7fad203b8c70 rank 2 nranks 8 cudaDev 2 nvmlDev 2 - Init COMPLETE
vtlrazin commented 4 years ago

Please use NCCL_NET_GDR_LEVEL=3. You are using not certified HW for GPU direct. Which kind of servers you are using?

vtlrazin commented 4 years ago

For more information about NCCL and GPUDirect please look - https://docs.nvidia.com/deeplearning/sdk/nccl-developer-guide/docs/env.html .

Below provided part of this page.

NCCL_NET_GDR_LEVEL (formerly NCCL_IB_GDR_LEVEL)
(since 2.3.4. In 2.4.0, NCCL_IB_GDR_LEVEL is renamed NCCL_NET_GDR_LEVEL)

The NCCL_NET_GDR_LEVEL variable allows the user to finely control when to use GPU Direct RDMA between a NIC and a GPU. The level defines the maximum distance between the NIC and the GPU.

Values accepted
0 : Never use GPU Direct RDMA. (always disabled)

1 : Use GPU Direct RDMA when GPU and NIC are on the same PCI switch.

2 : Use GPU Direct RDMA when GPU and NIC are connected through PCI switches (potentially multiple hops).

3 : Use GPU Direct RDMA when GPU and NIC are on the same PCI root complex, potentially going through the CPU.

4 : (Since 2.4.7) Use GPU Direct RDMA even across PCI root complexes, as long as GPU and NIC are within the same NUMA node. (Before 2.4.7) Use GPU Direct RDMA even across PCI root complexes, regardless of whether GPU and NIC are within the same NUMA node (always enabled).

5 : Use GPU Direct RDMA even across the SMP interconnect between NUMA nodes (e.g., QPI/UPI). (always enabled)

The default value is 2.
vilmara commented 4 years ago

Please use NCCL_NET_GDR_LEVEL=3. You are using not certified HW for GPU direct. Which kind of servers you are using?

Hi @vtlrazin, I am still getting the same message with NCCL_NET_GDR_LEVEL=3 "INFO NET/IB : GPU Direct RDMA Disabled for GPU 2[2] / HCA 0 (distance 3 >= 3)"

What does mean "certified HW for GPU direct"?, I am using c-4140 servers

vtlrazin commented 4 years ago

On your server c4140, GPU and Mellanox nic do not share the same upstream PCI Express root complex. Please run: lspci -tvvv On a server with upstream PCI Express root complex, this looks like this:

 |           \-1f.2  Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D Power Control Unit
 \-[0000:00]-+-00.0  Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 v4/Xeon D DMI2
             +-01.0-[01]----00.0  Intel Corporation PCIe Data Center SSD
             +-02.0-[02-08]----00.0-[03-08]--+-04.0-[04]----00.0  NVIDIA Corporation GP100GL [Tesla P100 PCIe 16GB]
             |                               +-08.0-[05]--+-00.0  Mellanox Technologies MT28908 Family [ConnectX-6]
             |                               |            \-00.1  Mellanox Technologies MT28908 Family [ConnectX-6]
vilmara commented 4 years ago

Here is how my server looks like with $ lspci -tvvv

-+-[0000:d7]-+-00.0-[d8]----00.0  Mellanox Technologies MT27800 Family [ConnectX-5]
 |         
 |
 +-[0000:ae]-+-00.0-[af]----00.0  NVIDIA Corporation GV100 [Tesla V100 SXM2]

In the past I was able to enable GPUDirect RDMA with these servers, now I need to figure out how to make it work again with the updated SW, I meant the right flags to enable GPUDirect RDMA

vilmara commented 4 years ago

Hi @vtlrazin, I got GPUDirect RDMA enabled after rebuilding my system from scratch with the below environment; however, the scaling efficiency across de nodes is just ~75%. Some suggestions on how to improve it?,

Flags: x NCCL_NET_GDR_LEVEL=3 -x NCCL_DEBUG_SUBSYS=NET -x NCCL_IB_DISABLE=0 -mca btl_tcp_if_include ib0 -x NCCL_SOCKET_IFNAME=ib0 -x NCCL_DEBUG=INFO --bind-to none --map-by slot --mca plm_rsh_args "-p 12345"

Environment: Framework: TensorFlow Framework version: TF 1.4 Horovod version: 0.18.2 via Horovod in docker MPI version: 4.0.0 CUDA version: 10.0 NCCL version: 2.5.6 CUDNN: 7.6.5 Python version: 2.7 OS and version: Ubuntu 18.04 GCC version: 4.8 Mellanox OFED 4.7-3.2.9.0 GPUDirect RDMA - nvidia-peer-memory_1.0-8

Tracelog:

master_node:24570:24907 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
master_node:24570:24907 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
master_node:24570:24907 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:192.168.11.1<0>
NCCL version 2.5.6+cuda10.0
master_node:24571:24898 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
master_node:24572:24901 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
master_node:24573:24902 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
master_node:24571:24898 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
master_node:24572:24901 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
master_node:24573:24902 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
secondary_node:77291:77813 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
secondary_node:77292:77812 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
secondary_node:77293:77811 [3] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
secondary_node:77290:77810 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
secondary_node:77293:77811 [3] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
secondary_node:77291:77813 [1] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
secondary_node:77292:77812 [2] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
secondary_node:77290:77810 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 0.
master_node:24572:24901 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:192.168.11.1<0>
master_node:24571:24898 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:192.168.11.1<0>
master_node:24573:24902 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:192.168.11.1<0>
secondary_node:77292:77812 [2] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:192.168.11.2<0>
secondary_node:77293:77811 [3] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:192.168.11.2<0>
secondary_node:77291:77813 [1] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:192.168.11.2<0>
secondary_node:77290:77810 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB ; OOB ib0:192.168.11.2<0>
master_node:24570:24907 [0] NCCL INFO NET/IB: Dev 0 Port 1 qpn 425 mtu 5 LID 1
master_node:24571:24898 [1] NCCL INFO NET/IB: Dev 0 Port 1 qpn 427 mtu 5 LID 1
master_node:24572:24901 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 426 mtu 5 LID 1
master_node:24573:24902 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 428 mtu 5 LID 1
secondary_node:77291:77813 [1] NCCL INFO NET/IB: Dev 0 Port 1 qpn 356 mtu 5 LID 4
secondary_node:77292:77812 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 355 mtu 5 LID 4
secondary_node:77293:77811 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 354 mtu 5 LID 4
secondary_node:77290:77810 [0] NCCL INFO NET/IB: Dev 0 Port 1 qpn 353 mtu 5 LID 4
master_node:24572:24901 [2] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 3.
master_node:24573:24902 [3] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 3.
master_node:24572:24901 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 437 mtu 5 LID 1
master_node:24573:24902 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 438 mtu 5 LID 1
secondary_node:77292:77812 [2] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 3.
secondary_node:77293:77811 [3] NCCL INFO NCCL_NET_GDR_LEVEL set by environment to 3.
secondary_node:77292:77812 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 365 mtu 5 LID 4
secondary_node:77293:77811 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 366 mtu 5 LID 4
master_node:24572:24901 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 86000 / HCA 0 (distance 2 < 3), read 0
master_node:24573:24902 [3] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU af000 / HCA 0 (distance 2 < 3), read 1
master_node:24573:24902 [3] NCCL INFO Ring 00 : 3[af000] -> 6[86000] [send] via NET/IB/0/GDRDMA
master_node:24572:24901 [2] NCCL INFO Ring 00 : 7[af000] -> 2[86000] [receive] via NET/IB/0/GDRDMA
secondary_node:77292:77812 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 86000 / HCA 0 (distance 2 < 3), read 0
secondary_node:77293:77811 [3] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU af000 / HCA 0 (distance 2 < 3), read 1
secondary_node:77292:77812 [2] NCCL INFO Ring 00 : 3[af000] -> 6[86000] [receive] via NET/IB/0/GDRDMA
secondary_node:77293:77811 [3] NCCL INFO Ring 00 : 7[af000] -> 2[86000] [send] via NET/IB/0/GDRDMA
master_node:24573:24902 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 443 mtu 5 LID 1
secondary_node:77293:77811 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 371 mtu 5 LID 4
master_node:24573:24902 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 444 mtu 5 LID 1
master_node:24573:24902 [3] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU af000 / HCA 0 (distance 2 < 3), read 0
secondary_node:77292:77812 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 374 mtu 5 LID 4
master_node:24573:24902 [3] NCCL INFO Ring 00 : 6[86000] -> 3[af000] [receive] via NET/IB/0/GDRDMA
secondary_node:77292:77812 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 86000 / HCA 0 (distance 2 < 3), read 1
secondary_node:77292:77812 [2] NCCL INFO Ring 00 : 6[86000] -> 3[af000] [send] via NET/IB/0/GDRDMA
secondary_node:77292:77812 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 377 mtu 5 LID 4
secondary_node:77292:77812 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 378 mtu 5 LID 4
secondary_node:77293:77811 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 379 mtu 5 LID 4
master_node:24572:24901 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 451 mtu 5 LID 1
master_node:24573:24902 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 452 mtu 5 LID 1
master_node:24572:24901 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 86000 / HCA 0 (distance 2 < 3), read 0
master_node:24573:24902 [3] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU af000 / HCA 0 (distance 2 < 3), read 1
secondary_node:77292:77812 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 86000 / HCA 0 (distance 2 < 3), read 0
secondary_node:77293:77811 [3] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU af000 / HCA 0 (distance 2 < 3), read 1
master_node:24572:24901 [2] NCCL INFO Ring 01 : 7[af000] -> 2[86000] [receive] via NET/IB/0/GDRDMA
master_node:24573:24902 [3] NCCL INFO Ring 01 : 3[af000] -> 6[86000] [send] via NET/IB/0/GDRDMA
secondary_node:77292:77812 [2] NCCL INFO Ring 01 : 3[af000] -> 6[86000] [receive] via NET/IB/0/GDRDMA
secondary_node:77293:77811 [3] NCCL INFO Ring 01 : 7[af000] -> 2[86000] [send] via NET/IB/0/GDRDMA
master_node:24573:24902 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 457 mtu 5 LID 1
secondary_node:77293:77811 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 384 mtu 5 LID 4
master_node:24572:24901 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 460 mtu 5 LID 1
secondary_node:77293:77811 [3] NCCL INFO NET/IB: Dev 0 Port 1 qpn 387 mtu 5 LID 4
master_node:24572:24901 [2] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU 86000 / HCA 0 (distance 2 < 3), read 1
secondary_node:77293:77811 [3] NCCL INFO NET/IB : GPU Direct RDMA Enabled for GPU af000 / HCA 0 (distance 2 < 3), read 0
master_node:24572:24901 [2] NCCL INFO Ring 01 : 2[86000] -> 7[af000] [send] via NET/IB/0/GDRDMA
secondary_node:77293:77811 [3] NCCL INFO Ring 01 : 2[86000] -> 7[af000] [receive] via NET/IB/0/GDRDMA
master_node:24572:24901 [2] NCCL INFO NET/IB: Dev 0 Port 1 qpn 463 mtu 5 LID 1
vtlrazin commented 4 years ago

I am not an authority to answer your question. Please open the “Support Request”.

In my opinion:

You need the server with `distance = 1`.
C4140 servers series "K" and "M" do not have a “distance = 1” between the Mellanox network adapter and the NVIDIA GPU.

B.R. Vitaliy

vilmara commented 4 years ago

Hi @vtlrazin, thanks a lot for your support. I realized the performance issues were related to the flag --xla=True

ajtarraga commented 1 year ago

Hi @vilmara I am trying to reproduce this issue and the once you opened to tuning parameters for Horovod-TensorFlow benchmarks #288. However, I am not able to see the difference between the use of GPUDirect RDMA and not to use it because it works like if there was no penalty of interconection network.

Could you please post the command you use in order to train the model via GPUDirect RDMA? I would like to try with your final configuration in order to compare both options.

Thank you in advance!