-
[https://github.com/merthidayetoglu/CommBench/blob/master/commbench.h](url)
Line 40 to 46 what is the point of commenting out the #define CAP_NCCL and others.
I have to manually uncomment it out…
-
Hello,
I'm new to nccl. I have use nccl in kubernetes + rdma-device-plugin.
I add multiple HCAs to Pod like `NCCL_IB_HCA=mlx5_1:1,mlx5_2:1`。and we also want to use `NCCL_IB_GID_INDEX=3` for mlx5…
-
First of all - thanks for all the great work!
My setup is on an H100. I am trying to use GPU 4,5,6,7 but I get the following error. I am able to run successfully with 0,1,2,3,4,5,6,7 GPUs. However…
-
thank you for attention this problem.
my workstation spec is
RTX A4000 *2
WSL2_Ubuntu-22.04
cudnn 8.9
(base) heartlab@DESKTOP-GGBQPHK:~/nccl-tests$ nvidia-smi
Fri Jun 28 05:15:17 2024
+------…
-
I am experiencing occasional NCCL operation failures with caused by the following IB completion error. What is the root cause of this error? What steps should I take to reduce (or eliminate) the frequ…
-
### Your current environment
Using:
* vllm 0.4.1
* nccl 2.18.1
* pytorch 2.2.1
### 🐛 Describe the bug
During inference I sometimes get this error:
```bash
(RayWorkerWrapper pid=2376582…
-
### Issue
I am trying to run the EFA EKS NCCL tests, but the public ECR image (public.ecr.aws/w6p6i9i7/aws-efa-nccl-rdma) referenced [here](https://github.com/aws-samples/aws-efa-eks/blob/main/exampl…
-
WARNING:root:Your Paddle Fluid has some problem with multiple GPU. This may be caused by:
1. There is only 1 or 0 GPU visible on your Device;
2. No.1 or No.2 GPU or both of them are occupied now…
-
I am a heavy user of NCCL, but recently I am aware of a new toolkit named NVSHMEM, which allows different gpu devices to directly communicate with each other using one-side rdma-like verbs. I am wonde…
-
Hi,
Firstly, appreciate publishing the open-source tool and the great support!! Currently, We encountered a lack performance issues while running the NCCL Test in the KVM environment on dual-node.…