-
Error:
```bash
bazel build //xla/tsl/cuda:nccl
ERROR: /home/ubuntu/workspace/xla/xla/tsl/cuda/BUILD.bazel:336:11: no such target '@local_config_nccl//:nccl_headers': target 'nccl_headers' not dec…
-
### Description
See https://github.com/ray-project/ray/pull/47141#discussion_r1747392605
### Use case
_No response_
-
Hi, I have a use case in which I would like to use nccl ops plugin from TRT-LLM in my project. I see that there is a code snippet in `tensorrt_llm/plugin/plugin.py ` which loads the `"libnvinfer_plugi…
-
We are seeing an issue with NCCL allreduce performance that we would appreciate Nvidia's help on.
We have three nodes split across two racks: Two nodes on one rack and one node on another rack.
Two-…
-
I'd like to do NCCL test on two nodes with 4 H100 GPUs per. I compiled nccl-test with MPI version via below commands:
```
CUDA_HOME=/usr/local/cuda-12.6
NCCL_HOME=/opt/nvidia/nvidia_hpc_benchmarks_m…
-
Hi,
I was wondering if it makes sense to set the NCCL_Algo=Tree while performing the all2all test?
Thanks,
-
### Describe your problem
Hi,
I have just bought a new computer with 4GPU and the VRAM is large enough to run some very large LLM locally like Mistral Large. I'm running backend server with LM St…
-
https://buildkite.com/xgboost/xgboost-ci-multi-gpu/builds/5027#018f9e76-44ae-4979-bd6d-c9aa5e0a617d
We ran into something like this before we moved into process-based multi-GPU training. The issue …
-
While commit 72b99a42291fcd6c5dcde694fcb3c5d72bc0c9c7 allows libmscclpp to compile using ROCm 6.0, there are still linker errors in libmscclpp_nccl:
```
ld.lld: error: duplicate symbol: __float2bf…
-
### Your current environment
The output of `python collect_env.py`
```text
Your output of `python collect_env.py` here
```
vllm 0.5.4
### 🐛 Describe the bug
目前在8 * A800上进行推理,vl…