-
## [Tasks](https://github.com/cupy/cupy/wiki/Actions-Needed-for-Dependency-Update)
- [x] Read Release Notes ([cuTENSOR](https://docs.nvidia.com/cuda/cutensor/index.html#changelog) / [cuSPARSELt](ht…
-
### What happened + What you expected to happen
```python
@pytest.mark.parametrize("ray_start_cluster_head_with_env_vars", [
{
"include_dashboard": True,
"env_va…
-
**Some software versions:**
nccl test : 2.13.9
openmpi: 4.1.5
rdma ofed: 23.10-1.1.9.0
nvidia-dirver: 535.104.12-1
cuda: 11.4.4-1
nccl: 2.21.5-1
**Command**
mpirun --allow-run-as-root -…
-
### Misc discussion on performance
I've been running some simple tests on multi-node parallel pipeline with NCCL. I doubled the bandwidth between the nodes but saw no increase in t/s or throughput.…
-
Currently, we build two wheel variants: `xgboost-cpu` (which excludes GPU code) and `xgboost` (where the GPU code targets CUDA 12.4). In #10729, `xgboost` is found to conflict with another package us…
hcho3 updated
4 weeks ago
-
## 🚀 Feature
Make streams used for NCCL operations configurable
## Motivation
I've noticed that PyTorch distributed module has introduced P2P send and receive functionality via NCCL (which is…
wjuni updated
2 months ago
-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTor…
-
**Describe the bug**
使用swift sft 命令微调MiniCPM-v-2.6模型时,训练到中途突然报错:
Some NCCL operations have failed or timed out. Due to the asynchronous nature of CUDA kernels, subsequent GPU operations might run on…
-
The script to reproduce the bug.
```python
import os
import time
import pickle
import torch
import threading
import torch.distributed as dist
import torch.distributed.distributed_c10d as c10…
-
## Background information
Hi team, thanks for your work on OpenMPI.
I am trying to use NCCL concurrently with the CUDA-aware OpenMPI. NCCL [makes a careful note in its documentation](https://docs.…