nccl Search Results - Githubissues

openxla/xla #17326

no such target @local_config_nccl//:nccl_headers

Error: ```bash bazel build //xla/tsl/cuda:nccl ERROR: /home/ubuntu/workspace/xla/xla/tsl/cuda/BUILD.bazel:336:11: no such target '@local_config_nccl//:nccl_headers': target 'nccl_headers' not dec…

apivovarov updated 3 days ago

ray-project/ray #47540

[aDAG] Support experimental_compile(_custom_nccl_group= nccl…

### Description See https://github.com/ray-project/ray/pull/47141#discussion_r1747392605 ### Use case _No response_

ruisearch42 updated 8 hours ago

NVIDIA/TensorRT-LLM #2220

nccl ops from TRT-LLM

Hi, I have a use case in which I would like to use nccl ops plugin from TRT-LLM in my project. I see that there is a code snippet in `tensorrt_llm/plugin/plugin.py ` which loads the `"libnvinfer_plugi…

apbose updated 6 days ago

NVIDIA/nccl #1453

Poor NCCL allreduce performance

We are seeing an issue with NCCL allreduce performance that we would appreciate Nvidia's help on. We have three nodes split across two racks: Two nodes on one rack and one node on another rack. Two-…

twichell updated 3 days ago

NVIDIA/nccl-tests #252

Test NCCL failure common with network error.

I'd like to do NCCL test on two nodes with 4 H100 GPUs per. I compiled nccl-test with MPI version via below commands: ``` CUDA_HOME=/usr/local/cuda-12.6 NCCL_HOME=/opt/nvidia/nvidia_hpc_benchmarks_m…

ismailguzel updated 7 hours ago

NVIDIA/nccl-tests #246

NCCL_Algo=Tree

Hi, I was wondering if it makes sense to set the NCCL_Algo=Tree while performing the all2all test? Thanks,

afattaholman updated 1 month ago

infiniflow/ragflow #2458

[Question]: Multi GPU {NCCL Error 1: unhandled cuda error (r…

### Describe your problem Hi, I have just bought a new computer with 4GPU and the VRAM is large enough to run some very large LLM locally like Mistral Large. I'm running backend server with LM St…

Alamkf updated 5 days ago

dmlc/xgboost #10312

Hang inside NCCL.

https://buildkite.com/xgboost/xgboost-ci-multi-gpu/builds/5027#018f9e76-44ae-4979-bd6d-c9aa5e0a617d We ran into something like this before we moved into process-based multi-GPU training. The issue …

trivialfis updated 3 weeks ago

microsoft/mscclpp #349

[Bug] libmscclpp_nccl fails linking using ROCm 6.0

While commit 72b99a42291fcd6c5dcde694fcb3c5d72bc0c9c7 allows libmscclpp to compile using ROCm 6.0, there are still linker errors in libmscclpp_nccl: ``` ld.lld: error: duplicate symbol: __float2bf…

corey-derochie-amd updated 3 days ago

vllm-project/vllm #7775

[Bug]: llama3-405b-fp8 NCCL communication

### Your current environment The output of `python collect_env.py` ```text Your output of `python collect_env.py` here ``` vllm 0.5.4 ### 🐛 Describe the bug 目前在8 * A800上进行推理，vl…

wangwensuo updated 5 days ago

1000+ results for nccl

1000+ results
for nccl