-
I am trying to build pytorch 1.5.0-rc1 from source and i am seeing this error.
Linking libnccl.so.2.4.8 > /sources/pytorch/build/nccl/lib/libnccl.so.2.4.8
Generating nccl.pc.…
-
### Please check that this issue hasn't been reported before.
- [X] I searched previous [Bug Reports](https://github.com/OpenAccess-AI-Collective/axolotl/labels/bug) didn't find any similar reports…
-
https://github.com/NVIDIA/nccl/blob/b6d7438d3145a619f924dbbca6c96db21fab716e/src/init.cc#L617
is called in a for loop:
https://github.com/NVIDIA/nccl/blob/b6d7438d3145a619f924dbbca6c96db21fab716e/s…
-
I followed the instructions, and I was unable to run it under Windows 10 due to `nccl`
-
h2o-3 crashes with the following stacktrace when XGBoost is run on BNPParibas as munged by autodl 0.9.1. This is with h2o-3 built from [~accountid:557058:389d9607-5bd8-4611-8c6a-755fe9295223]'s branc…
-
Hi, NCCL teamers:
Why "Enable LL128 by default only on Volta/Ampere/Hopper+NVLink"? the root reason? thx
https://github.com/NVIDIA/nccl/blob/f3d51667838f7542df8ea32ea4e144d812b3ed7c/src/graph/t…
-
**Please describe the bug**
Hi, according to the [alpa installation doc](https://alpa.ai/install.html), we need to `pip3 install cupy-cuda11x` to install cupy. However, when CUDA version is 11.1, acc…
-
Hi there,
We are trying to run all_reduce_perf with Nsight, to get HBM usage metrics.
However, all_reduce_perf will hang after printing “==PROF== Profiling "ncclKernel_AllReduce_RING_LL_..." - 1:”…
-
I found this MoE runs on DeepSpeed, but deepspeed has issues when runing on server without MPI. Any solution?
-
# Setup
- A multi-GPU rig, having top of the line GPUs:
- Several 3090 GPUs;
- Or several A100 GPUs;
- A `pytorch:1.7.0-cuda11.0-cudnn8-devel` container derivative;
- Latest `docker`, `nvid…