-
![b642c6cb-0ea5-4f69-8fc9-a82dfaed1616](https://github.com/user-attachments/assets/b0c57787-e777-4e58-94ff-d834c2ef8273)
-
Hi
NCCL version: v2.21.5
when I set NCCL_P2P_USE_CUDA_MEMCPY=1,train a resnet model using pytorch with two GPUs in same NUMA, NCCL will hung,pytorch timeout crash
pytorch error:
`[rank1]:[E Proces…
-
### Description
See https://github.com/ray-project/ray/pull/47845
RFC doc: https://docs.google.com/document/d/1zu9SllrEAjPHqs-eeITtrSSbv0rBxtkyCJeweZJl100/edit?usp=sharing
### Use case
_No resp…
-
In PR #84 we are adding support for NCCL TL. If UCC was built with NCCL support TL NCCL might be selected by CLs for CUDA collectives i.e. when both source and destination buffers are of memory type C…
-
Scheduling two containers on the same node results in significantly lower performance for nccl-test than scheduling two containers on different nodes
An experiment to schedule two containers to diffe…
-
### Description
See https://github.com/ray-project/ray/pull/47141#discussion_r1747392605
### Use case
_No response_
-
I compile nccl-tests with the command:
```shell
make MPI=1 MPI_HOME=${NVHPC_ROOT}/comm_libs/12.4/hpcx/hpcx-2.19/ompi NCCL_HOME=${NVHPC_ROOT}/comm_libs/nccl CUDA_HOME=${NVHPC_ROOT}/cuda
```
And run th…
-
When I am running both train and test of the model on single GPU (./tools/fp16/dist_train.sh ./projects/configs/bevformer_fp16/bevformer_tiny_fp16.py 1), I am getting this error:
```
RuntimeError: N…
-
My test environment is two nodes, each node is equipped with eight Gpus respectively, and the gpu model is A800-SXM4-80GB, as shown in the following figure:
node1
![Image](https://github.com/user-atta…
-
Hello,
I have been going through the logging functionality in NCCL and wanted to know if there is a way to determine the global ranks of the devices that are involved in a collective operation. Curr…