NVIDIA / nccl

Optimized primitives for collective multi-GPU communication
Other
3.14k stars 791 forks source link

How to use p2p over PCIe between different containers on the same node? #382

Open weberxie opened 4 years ago

weberxie commented 4 years ago

Environment: NCCL Version: 2.4.8 and 2.5.7 CUDA Version: 10.0 OS Version: CentOS 7

Problem: We are running 4 containers per container 1 GPU on the same node, the docker run command is: docker run -e NVIDIA_VISIBLE_DEVICES='' ,

By the NCCL logs, we observed that these GPUs are connected by socket instead of P2P,

Question: So, how can we connect these GPUs with P2P? Which NCCL stable version can solve this problem? Thanks.

weberxie commented 4 years ago

I have researched the issues https://github.com/NVIDIA/nccl/issues/324 and https://github.com/NVIDIA/nccl/issues/326, and didn't find an answer.