nccl Search Results - Githubissues

1000+ results
for nccl

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

cupy/cupy #6521

Support NCCL 2.12

## [Tasks](https://github.com/cupy/cupy/wiki/Actions-Needed-for-Dependency-Update) - [x] Read Release Notes ([cuTENSOR](https://docs.nvidia.com/cuda/cutensor/index.html#changelog) / [cuSPARSELt](ht…

takagi updated 2 years ago
1
ray-project/ray #47169

[core][aDAG] NestedTorchTensorNcclChannel should automatical…

### What happened + What you expected to happen ```python @pytest.mark.parametrize("ray_start_cluster_head_with_env_vars", [ { "include_dashboard": True, "env_va…

kevin85421 updated 2 months ago
1
NVIDIA/nccl #1332

The variable NCCL_IB_ADDR_RANGE did not work properly after …

**Some software versions：** nccl test : 2.13.9 openmpi: 4.1.5 rdma ofed: 23.10-1.1.9.0 nvidia-dirver: 535.104.12-1 cuda: 11.4.4-1 nccl: 2.21.5-1 **Command** mpirun --allow-run-as-root -…

riverzhang updated 4 months ago
3
vllm-project/vllm #6610

[Performance]: Multi-node Pipeline Parallel double bandwidth…

### Misc discussion on performance I've been running some simple tests on multi-node parallel pipeline with NCCL. I doubled the bandwidth between the nodes but saw no increase in t/s or throughput.…

drikster80 updated 2 weeks ago
5
dmlc/xgboost #10807

[CI] Upload wheel variants for CUDA 11 and 12

Currently, we build two wheel variants: `xgboost-cpu` (which excludes GPU code) and `xgboost` (where the GPU code targets CUDA 12.4). In #10729, `xgboost` is found to conflict with another package us…

hcho3 updated 4 weeks ago
2
pytorch/pytorch #67158

Make streams used for NCCL operations configurable

## 🚀 Feature Make streams used for NCCL operations configurable ## Motivation I've noticed that PyTorch distributed module has introduced P2P send and receive functionality via NCCL (which is…

wjuni updated 2 months ago
10
vllm-project/vllm #7871

[Bug]: AttributeError: 'RayGPUExecutorAsync' object has no a…

### Your current environment The output of `python collect_env.py` ```text Collecting environment information... PyTorch version: 2.3.1+cu121 Is debug build: False CUDA used to build PyTor…

MrWiffer updated 2 months ago
1
modelscope/ms-swift #1817

训练中途突然报错 NCCL watchdog thread terminated with exception

**Describe the bug** 使用swift sft 命令微调MiniCPM-v-2.6模型时，训练到中途突然报错： Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on…

Wuyingwen updated 1 month ago
8
pytorch/pytorch #131755

CUDA error: an illegal memory access was encountered when us…

The script to reproduce the bug. ```python import os import time import pickle import torch import threading import torch.distributed as dist import torch.distributed.distributed_c10d as c10…

workingloong updated 3 months ago
1
open-mpi/ompi #8806

NCCL and CUDA-aware MPI?

## Background information Hi team, thanks for your work on OpenMPI. I am trying to use NCCL concurrently with the CUDA-aware OpenMPI. NCCL [makes a careful note in its documentation](https://docs.…

cadedaniel updated 3 years ago
3

上一页 1...21 22 23 24 25 26 27...100 下一页

1000+ results for nccl

1000+ results
for nccl