Open weberxie opened 4 years ago
Environment: NCCL Version: 2.4.8 and 2.5.7 CUDA Version: 10.0 OS Version: CentOS 7
Problem: We are running 4 containers per container 1 GPU on the same node, the docker run command is: docker run -e NVIDIA_VISIBLE_DEVICES='' ,
docker run -e NVIDIA_VISIBLE_DEVICES=''
By the NCCL logs, we observed that these GPUs are connected by socket instead of P2P,
Question: So, how can we connect these GPUs with P2P? Which NCCL stable version can solve this problem? Thanks.
I have researched the issues https://github.com/NVIDIA/nccl/issues/324 and https://github.com/NVIDIA/nccl/issues/326, and didn't find an answer.
Environment: NCCL Version: 2.4.8 and 2.5.7 CUDA Version: 10.0 OS Version: CentOS 7
Problem: We are running 4 containers per container 1 GPU on the same node, the docker run command is:
docker run -e NVIDIA_VISIBLE_DEVICES=''
,By the NCCL logs, we observed that these GPUs are connected by socket instead of P2P,
Question: So, how can we connect these GPUs with P2P? Which NCCL stable version can solve this problem? Thanks.