nccl Search Results - Githubissues

1000+ results
for nccl

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

pytorch/pytorch #36125

build Pytorch-1.5.0-rc1 from source fail

I am trying to build pytorch 1.5.0-rc1 from source and i am seeing this error. Linking libnccl.so.2.4.8 > /sources/pytorch/build/nccl/lib/libnccl.so.2.4.8 Generating nccl.pc.…

522730312 updated 4 years ago
3
axolotl-ai-cloud/axolotl #1495

qwen moe3 fine tune error

### Please check that this issue hasn't been reported before. - [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports…

manishiitg updated 5 months ago
6
NVIDIA/nccl #1156

Pack bootstrapAllGather's [2]

https://github.com/NVIDIA/nccl/blob/b6d7438d3145a619f924dbbca6c96db21fab716e/src/init.cc#L617 is called in a for loop: https://github.com/NVIDIA/nccl/blob/b6d7438d3145a619f924dbbca6c96db21fab716e/s…

kwen2501 updated 8 months ago
7
meta-llama/codellama #40

Unable to run it under Windows 10

I followed the instructions, and I was unable to run it under Windows 10 due to `nccl`

eranif updated 1 year ago
5
h2oai/h2o-3 #11874

XGBoost: "NCCL failure :cuda malloc failed" memory allocatio…

h2o-3 crashes with the following stacktrace when XGBoost is run on BNPParibas as munged by autodl 0.9.1. This is with h2o-3 built from [~accountid:557058:389d9607-5bd8-4611-8c6a-755fe9295223]'s branc…

exalate-issue-sync[bot] updated 1 year ago
1
NVIDIA/nccl #786

Why "Enable LL128 by default only on Volta/Ampere/Hopper+NVL…

Hi, NCCL teamers: Why "Enable LL128 by default only on Volta/Ampere/Hopper+NVLink"? the root reason? thx https://github.com/NVIDIA/nccl/blob/f3d51667838f7542df8ea32ea4e144d812b3ed7c/src/graph/t…

mtxuhao updated 1 year ago
4
alpa-projects/alpa #950

cupy package mismatches with CUDA version in the docs

**Please describe the bug** Hi, according to the [alpa installation doc](https://alpa.ai/install.html), we need to `pip3 install cupy-cuda11x` to install cupy. However, when CUDA version is 11.1, acc…

serach24 updated 1 year ago
2
NVIDIA/nccl-tests #101

Profiling all_reduce_perf with Nsight hangs

Hi there, We are trying to run all_reduce_perf with Nsight, to get HBM usage metrics. However, all_reduce_perf will hang after printing “==PROF== Profiling "ncclKernel_AllReduce_RING_LL_..." - 1:”…

caogao updated 2 years ago
1
PKU-YuanGroup/MoE-LLaVA #12

/deepspeed/comm/comm.py", line 341, in all_to_all_single …

I found this MoE runs on DeepSpeed, but deepspeed has issues when runing on server without MPI. Any solution?

lucasjinreal updated 5 months ago
13
NVIDIA/nccl #431

Feature request - using 2 GPU workers on one large GPU (A100…

# Setup - A multi-GPU rig, having top of the line GPUs: - Several 3090 GPUs; - Or several A100 GPUs; - A `pytorch:1.7.0-cuda11.0-cudnn8-devel` container derivative; - Latest `docker`, `nvid…

snakers4 updated 2 years ago
5

上一页 1...94 95 96 97 98 99 100...100 下一页

1000+ results for nccl

1000+ results
for nccl