nccl Search Results - Githubissues

1000+ results
for nccl

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

dmlc/xgboost #10729

Conflict with a CUDA-11 PyTorch installation

XGBoost for Python depends on `nvidia-nccl-cu12`, which is for CUDA 12. I have a PyTorch 2.4.0 installation for CUDA 11.8, but when I use distributed mode, PyTorch picks up on the one installed by XGB…

bryant1410 updated 1 month ago
3
pytorch/pytorch #134700

[c10d][MPI] Attempting to create a new group after MPI cause…

### 🐛 Describe the bug ```python import torch torch.distributed.init_process_group(backend="mpi") nccl_group = torch.distributed.new_group(backend="nccl") ``` ``` [rank0]: Traceback (most r…

Aidyn-A updated 5 days ago
5
amazon-science/robust-tableqa #4

NCCL libary runtime error

Hi, I tried to run your code by following the readme instructions. When I tried the main experiment [ColBERT Retrieval], and run the script: python src/main.py configs/nq_tables/colbert.jso…

wangzhen263 updated 6 months ago
1
Derecho-Project/dccl #2

Question about nccl

qdbkppkbdq updated 3 months ago
2
pytorch/pytorch #136390

[c10d][nccl][cuda] Regression (unspecific cuda launch error)…

### 🐛 Describe the bug When running python test/distributed/test_c10d_nccl.py -k test_nan_assert_float16 on a H100x2 platform, the current nightly (and likely v2.5.0 RC) is producing the follo…

nWEIdia updated 10 hours ago
10
jaywalnut310/vits #202

NCCL error windows

I usage: `python train.py -c configs\ljs_base.json -m ljs_base` output: ``` DEBUG:numba.core.byteflow:bytecode dump: > 0 NOP(arg=None, lineno=1051) 2 LOAD_FAST(arg=0, l…

Vubni updated 8 months ago
1
deeplearning4j/deeplearning4j #7878

NCCL support

We need to add NCCL support as backend/implementation of Communicator abstraction, which will provide all required functionality for synchronous distributed SameDiff training

raver119 updated 4 years ago
2
BVLC/caffe #6204

NCCL & LevelDB

Is this a bug? Training using NCCL with 2 gpus 1080 and 1060 and a LevelDB Data Layer? When using single GPU this does not happen. It naively appears to me that the levelDB is trying to be open…

farhan333 updated 4 years ago
2
supremind/hub-mirror #17

[hub-mirror] 请求执行任务

{ "platform":"", "hub-mirror": [ ghcr.io/coreweave/nccl-tests:12.2.2-cudnn8-devel-ubuntu22.04-nccl2.19.3-1-85f9143 ] }

guangli973 updated 5 days ago
1
ray-project/ray #45308

[core][experimental] Support broadcast NCCL ops in accelerat…

### Description When the same GPU tensor is sent to multiple readers, we should use ncclBroadcast under the hood to reduce transfer time. ### Use case _No response_

stephanie-wang updated 1 week ago
7

上一页 1...6 7 8 9 10 11 12...100 下一页

1000+ results for nccl

1000+ results
for nccl