-
### Description
I'm attempting to build `jaxlib` with a local CUDA, CUDNN, and NCCL. I'm running into (different) issues with either `gcc` of `clang`. Any ideas??:
## Build command:
```pre
pyt…
-
### Description
```
inputs = flow.ones([4, 10],dtype=flow.float32, placement=placement, sbp=flow.sbp.broadcast)
weight = flow.ones([10, 4],dtype=flow.float32, placement=placement, sbp=flow.…
-
Tried
```
(mambair) brcao@dawn:/eData05/brcao/Repos/MambaIR$ python3 -m torch.distributed.launch --nproc_per_node=8 --master_port=1234 basicsr/train.py -opt options/train/train_MambaIR_SR_x2.yml --l…
-
Hello! If I do not use NCCL net plugin and only use internal implementation of isend/irecv, I wonder where and how the isend/irecv is defined here: https://github.com/NVIDIA/nccl/blob/5d3ab08b69754cb8…
-
Model: Qwen-14B-Chat (QWen2)
Dataset: https://huggingface.co/datasets/Hello-SimpleAI/HC3-Chinese/blob/main/open_qa.jsonl
Environment: 2 A30 GPU
Issue 1:
Error: can't init model correctly. Disab…
-
I think that NCCL is part of PyTorch? I am running Python 3.9 so I had to install torch using
-c=conda-forge
as specified in the instructions for installing torch. It seemed to install correctl…
-
### Root Cause
The root cause is due to recent transformers update [to resolve high CPU usage for large quantized models](https://github.com/huggingface/transformers/pull/33154).
- what the PR…
-
hello,
>
I would like to ask a PyTorch question,I used Ubuntu 22.04, AMD GPU, ZLUDA, and I found that the compilation of pytorch did not use zluda/target/release. So how does zluda work? These expor…
-
RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1639180594101/work/torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:957, invalid usage, NCCL version 21.0.3
-
In our SLURM cluster we use dual attached servers connected by L3 BGP unnumbered (with FRR "BGP to the host") via lan0 and lan1 interfaces (ECMP, see routing table, which is really simple, as everythi…