nccl Search Results - Githubissues

1000+ results
for nccl

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

spack/spack #39079

NCCL forces the selection of a CUDA arch

The code change in #28433 written by @adamjstewart and committed by @alalazo makes it so one must specify CUDA arch(es): https://github.com/spack/spack/blob/bd9f8ba0947d26fa5775b7be1979746034f4ca8…

dmagdavector updated 4 months ago
3
NVIDIA/nccl #1324

How to locate the hanging node?

Currently, when I encounter a timeout error with NCCL, locating the hanging node is quite time-consuming. Does NCCL have a feature to achieve this? If not, could you provide ideas for implementing it …

Puzzzle7 updated 3 months ago
1
NVIDIA/nccl #1434

it supports fast failure when RDMA write fails,

log ``` :522:622 [2] transport/net_ib.cc:1295 NCCL WARN NET/IB : Got completion from peer 10.1.15.233 with error 4, opcode 32601, len 32600, vendor err 81 (Send) localGid ::ffff:10.1.77.5 remoteGid :…

alpha-baby updated 2 weeks ago
1
jazzband/pip-tools #2127

pipx installed pip-sync doesn't work with conda active env

#### Environment Versions 1. OS Type: linux 1. Python version: 3.10.14 1. pip version: 24.2 1. pip-tools version: 7.4.1 #### Steps to replicate 1. `pip install xgboost` 2. `pip list |…

kyuwoo-choi updated 23 hours ago
3
JuliaGPU/NCCL.jl #56

Complex number wrapper

NCCL does not support complex numbers directly and does not plan to ([see issue](https://github.com/NVIDIA/nccl/issues/539)). Are we willing to add a wrapper to NCCL.jl to make using complex numbers …

nikopj updated 2 months ago
6
ray-project/ray #46440

[DAG] cpu tensor returned by DAG actor method gets automatic…

### What happened + What you expected to happen I build a ADAG with NCCL channel, and executed once. After execution, I called another actor method, which returns a CPU tensor. However, it is au…

woshiyyya updated 4 days ago
3
NVIDIA/TensorRT-LLM #1498

Encountered an error: peer access is not supported between t…

**I built TensoRT-LLM 0.9.0 from source code base on nvcr.io/nvidia/tritonserver:24.02-py3 , and run scripts or commands from https://github.com/NVIDIA/TensorRT-LLM/blob/main/docker/Dockerfile.multi.*…

liu21yd updated 4 months ago
4
pytorch/pytorch #132640

CUDA Invalid Memory Access caused by torch.distributed.barri…

### 🐛 Describe the bug ``` TORCH_NCCL_AVOID_RECORD_STREAMS=1 PYTORCH_NO_CUDA_MEMORY_CACHING=1 torchrun --standalone --nnodes=1 --nproc_per_node=2 --no-python compute-sanitizer python -c 'import torc…

lw updated 1 month ago
6
vllm-project/vllm #8087

[Bug]: when tensor-parallel-size>1，Stuck

### Your current environment The output of `python collect_env.py` Collecting environment information... PyTorch version: 2.4.0 Is debug build: False CUDA used to build PyTorch: 12.4 ROCM …

wiluen updated 2 weeks ago
9
cupy/cupy #6521

Support NCCL 2.12

## [Tasks](https://github.com/cupy/cupy/wiki/Actions-Needed-for-Dependency-Update) - [x] Read Release Notes ([cuTENSOR](https://docs.nvidia.com/cuda/cutensor/index.html#changelog) / [cuSPARSELt](ht…

takagi updated 2 years ago
1

上一页 1...19 20 21 22 23 24 25...100 下一页

1000+ results for nccl

1000+ results
for nccl