-
```
Hi!
It would be nice having user defined variables per host in pdsh.
This could be the first "execute different command by host (I see a wishlist
bug here talking about)" implementation:
Like …
-
```
Hi!
It would be nice having user defined variables per host in pdsh.
This could be the first "execute different command by host (I see a wishlist
bug here talking about)" implementation:
Like …
-
Thank you for taking the time to submit an issue!
## Background information
### What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
```OMPI Version: v…
-
when I want to use Baichuan to train,I give some args and it returns me some errors like below.
[real_accelerator.py:191:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/home/sunmy/a…
-
# Batch_input and elapsed time per iteration slow down during model training
![微信图片编辑_20240629150957](https://github.com/EleutherAI/gpt-neox/assets/140717408/dae875c7-c01f-47e0-8767-aa8fe53cd476)
…
-
- GRIMs mpi task 4, NEMO mpi task 1~3 => 오류 발생, 무한 대기
- GRIMs mpi task 4, NEMO mpi task 4~? => 정상 수행
- ~/oasis3-mct/lib/mct/mct/m_Transfer.F90 내부의 MPI_WAITALL에서 deadlock 발생
```
Subroutine waitrecv…
-
nccl-tests would hang up with the specific messages sizes when I tests ReduceScatter (reduce_scatter_perf). for example, in the below screen shot, it hangs at 4G message size. Some times it hangs at 1…
-
Thank you for taking the time to submit an issue!
## Background information
### What version of Open MPI are you using? (e.g., v3.0.5, v4.0.2, git branch name and hash, etc.)
v4.0.5
### De…
-
### System Info
```Shell
pip install accelerate.
```
### Information
- [ ] The official example scripts
- [ ] My own modified scripts
### Tasks
- [ ] One of the scripts in the exam…
-
Hi again: been quite a while, but, no matter:
just tried to run an OpenMPI 5.0.3 mpirun for some noddy testing, and was told:
```
Sorry! You were supposed to get help about:
hostfile:extr…