-
I tried to run the test_nccl.py code in [ my own repo](https://github.com/MonicaGu/NCCLCommunication). This repo consists codes that provide a Python API of NCCL to send or receive PyTorch tensors. I …
-
The Nvidia HPCG benchmark used to available only via containers, but it was open-sourced last week and it's available at https://github.com/NVIDIA/nvidia-hpcg. This works on both CPU (x86-64 and aarc…
-
hello,
>
I would like to ask a PyTorch question,I used Ubuntu 22.04, AMD GPU, ZLUDA, and I found that the compilation of pytorch did not use zluda/target/release. So how does zluda work? These expor…
-
Hi Folks,
I was trying to understand why NCCL doesn't negotiate port, i.e., nat transparency.
If one instance inside a docker, let's say port translated 54321. i.e -p 54321:54321
On another host…
-
Hi, we recently observed that when running with NCCL_ALGO=Tree,NCCL_PROTO=Simple. NCCL fallback to Ring,LL with broadcast. It seems like NCCL_PROTO is ignored when there is no ALGO/PROTO pair found fo…
-
![fig](https://github.com/user-attachments/assets/80398e7f-975b-4de1-9c9b-ff85633a5d77)
code/overall/LLM_deepspeed.yaml, train_batch_size and eval_batch_size both set 1
NCCL error for single gpu, do…
-
when i use dino config to test with pt1.13+mmcv 2.0.0, i got this error
-
想知道这个微调大概需要多少显存呢,我用了6张4090,但还是爆显存了,能帮忙看下问题吗。我的运行脚本长以下这样:
data_path='./data_files'
model_name_or_path='/data2/hugo/lin_rany/model/Meta-Llama-3-8B-Instruct'
export NCCL_P2P_DISABLE=1
export NCCL_IB_…
-
**Describe the bug**
Running the Pythia-7B fine-tune script on 4 x A10 (24GB each).
Seems like issue with seq len:
_```
Token indices sequence length is longer than the specified maximum seque…
-
### Description
![image](https://github.com/user-attachments/assets/aec7915a-176e-4290-a002-d6e048bcff9a)
A, B, C, and D are different actors, and all data between these four actors is transferr…