300node 8GPU 4 IB NCCL TEST

NVIDIA / nccl

Optimized primitives for collective multi-GPU communication

Other

3.15k stars 791 forks source link

300node 8GPU 4 IB NCCL TEST #1454

Open gim4moon opened 6 days ago

gim4moon commented 6 days ago

Hello

Currently, our client company is supporting nccl-test.

We are supporting it by writing the script below.

mpirun -np 300 -N 1 -x NCCL_DEBUG=INFO --hostfile /nccl/hostfile \ -mca plm_rsh_no_tree_spawn 1 -mca plm_rsh_num_concurrent 512 \ --bind-to none -mca btl tcp,self -mca coll_hcoll_enable 0 \ -x NCCL_SOCKET_IFNAME=bond0 \ -x NCCL_IB_AR_THRESHOLD=0 -x NCCL_IB_PCI_RELAXED_ORDERING=1 \ -x NCCL_IB_SPLIT_DATA_ON_QPS=0 -x NCCL_IB_QPS_PER_CONNECTION=2 -x CUDA_DEVICE_ORDER=PCI_BUS_ID \ -x PATH -x LD_LIBRARY_PATH=$LD_LIBRARY_PATH \ -x NCCL_NET_GDR_READ=1 -x NCCL_IGNORE_CPU_AFFINITY=1 -x NCCL_DEBUG_SUBSYS=INIT,ENV,GRAPH -x NCCL_DEBUG_SUBSYS=NET \ /nccl/nccl-tests/build/all_reduce_perf -b 512 -e 8G -f 2 -g 8

The max busbw is only 14GB/s

Is there something wrong with the command? Please help me.

kiskra-nvidia commented 6 days ago

We need more info. What are the GPUs? What is the interconnect? The output of nvidia-smi and nvidia-smi topo -m from one of the nodes would be nice, as would a dump of the topology detected by NCCL. Can you include the NCCL debug output (from just one of the ranks, please! 😃), especially since you collect it already? It might be worth adding TUNING to the list of subsystems to debug...

gim4moon commented 6 days ago

We need more info. What are the GPUs? What is the interconnect? The output of nvidia-smi and nvidia-smi topo -m from one of the nodes would be nice, as would a dump of the topology detected by NCCL. Can you include the NCCL debug output (from just one of the ranks, please! 😃), especially since you collect it already? It might be worth adding TUNING to the list of subsystems to debug...

The node is Dell XE9680.

The GPU is H100 x 8EA per node.

The Infiniband has connectX-7 x 4EA VPI card (mlx5_0:1, mlx5_1:1, mlx5_2:1, mlx5_3:1) per node and 200G ethernet cards x 2EA (bonding configuration).

The topology is GPU to GPU connected with NV18, and GPU to NIC connected with PIX.

I'm sorry I can't provide the original nvidia-smi and topo!

I appreciate your help as much as possible.