-
**Describe the bug**
installed byteps with gcc4.9 and tensorflow=1.11.0
when i run `python3 keras_mnist.py`
it occurs that
byteps-0.1.0-py3.6-linux-x86_64.egg/byteps/tensorflow/c_lib.cpython-3…
-
### 🚀 The feature, motivation and pitch
MSCCL++ redefines inter-GPU communication interfaces, offering a highly efficient and customizable communication stack tailored for distributed GPU application…
-
### Your current environment
```text
The output of `python collect_env.py`
```
Differences between docker and local
in docker:
```
CUDA runtime version:Could not collect
cuDNN version: 9.0…
-
**Please describe the bug**
Hi, according to the [alpa installation doc](https://alpa.ai/install.html), we need to `pip3 install cupy-cuda11x` to install cupy. However, when CUDA version is 11.1, acc…
-
System: Perlmutter.
Modules / Software we are compiling with: PrgEnv-nvidia/8.2.0 & nvidia/21.7.
We at NERSC, have these NCCL test's as our reframe test. Currently all test fails if we want to r…
-
### System Info
```
root@6f75f4d87c8b:~/TensorRT-LLM/examples/llama# nvdisasm --version
nvdisasm: NVIDIA (R) CUDA disassembler
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Mon_Apr__3_17:…
-
I am running the full finetune distributed recipe, when setting `clip_grad_norm: 1.0` and `fsdp_cpu_offload: True`, it raises error
`RuntimeError: No backend type associated with device type cpu`
…
-
## 🐛 Bug
Errors occur when operating broadcast in pytorch v1.6.1 with nccl v2.7.8 as backend and set NCCL_BLOCKING_WAIT=1. There is 8 gpus in 1 docker and the rank0 broadcast tensor to other rank. Ra…
-
## 🐛 Bug
Currently when setting `USE_GLOO` cmake option to `ON`, target `gloo_cuda` requires a dependency called `nccl_external`; however, this target is avaliable if and only if `USE_SYSTEM_NCCL` …
-
Hi, when I tried to continue the training on the Conditional flow-matching on a new dataset (zh collected from youtube), I found that the loss degradee a lot but the generate audio is totally unintell…