-
The code change in #28433 written by @adamjstewart and committed by @alalazo makes it so one must specify CUDA arch(es):
https://github.com/spack/spack/blob/bd9f8ba0947d26fa5775b7be1979746034f4ca8…
-
Currently, when I encounter a timeout error with NCCL, locating the hanging node is quite time-consuming. Does NCCL have a feature to achieve this? If not, could you provide ideas for implementing it …
-
log
```
:522:622 [2] transport/net_ib.cc:1295 NCCL WARN NET/IB : Got completion from peer 10.1.15.233 with error 4, opcode 32601, len 32600, vendor err 81 (Send) localGid ::ffff:10.1.77.5 remoteGid :…
-
#### Environment Versions
1. OS Type: linux
1. Python version: 3.10.14
1. pip version: 24.2
1. pip-tools version: 7.4.1
#### Steps to replicate
1. `pip install xgboost`
2. `pip list |…
-
NCCL does not support complex numbers directly and does not plan to ([see issue](https://github.com/NVIDIA/nccl/issues/539)). Are we willing to add a wrapper to NCCL.jl to make using complex numbers …
-
### What happened + What you expected to happen
I build a ADAG with NCCL channel, and executed once.
After execution, I called another actor method, which returns a CPU tensor. However, it is au…
-
**I built TensoRT-LLM 0.9.0 from source code base on nvcr.io/nvidia/tritonserver:24.02-py3 , and run scripts or commands from https://github.com/NVIDIA/TensorRT-LLM/blob/main/docker/Dockerfile.multi.*…
-
### 🐛 Describe the bug
```
TORCH_NCCL_AVOID_RECORD_STREAMS=1 PYTORCH_NO_CUDA_MEMORY_CACHING=1 torchrun --standalone --nnodes=1 --nproc_per_node=2 --no-python compute-sanitizer python -c 'import torc…
-
### Your current environment
The output of `python collect_env.py`
Collecting environment information...
PyTorch version: 2.4.0
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM …
-
## [Tasks](https://github.com/cupy/cupy/wiki/Actions-Needed-for-Dependency-Update)
- [x] Read Release Notes ([cuTENSOR](https://docs.nvidia.com/cuda/cutensor/index.html#changelog) / [cuSPARSELt](ht…