-
# Machine
NVIDIA-SMI 535.161.08 Driver Version: 535.161.08 CUDA Version: 12.2
# SoftWare
torch 2.1.1
transformer-engine 1.9.0.dev0+56e0b35
# Run Cmd:
deepspeed --hostfile hostfile --maste…
-
```
Приветствую.
Есть ли инструкции по обновлению?
Попробовал сам, не получилось.
Имею:
# uname -a
Linux voip.site.ru 2.6.32-042stab094.7 #1 SMP Wed Oct 22 12:43:21 MSK 2014 i686
i686 i386 GNU/Linux
…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
训练用yaml文件时,怎么设置deepspeed的 hostfile?
### Reproduction
训练用yaml文件时,怎么设置deepspeed的 hostfile?
### Expected …
-
I try a very simple setup :
``` ruby
# ClouderaManager node
node /^cm\d+.vagrant.dev$/ {
class { 'cloudera':
cm_server_host => $::hostname,
install_cmserver => true,
}
}
```
And the …
-
Added the repo from [here](https://copr.fedorainfracloud.org/coprs/buglloc/Brick/repo/epel-7/buglloc-Brick-epel-7.repo) and installed brick.
Got this error while launching
`/usr/bin/brick: symbol loo…
-
I compile with MPI=1, check for same version in the two nodes of OpenMPI.
I compile OpenMPI with UCX.
When i run the follow:
```
rene@puente:~/nccl-tests$ mpirun -x UCX_NET_DEVICES=mlx5_0:1,mlx5_…
-
Hi, it's me again. The training is working great but when it comes to saving the checkpoint, I got this bug. Any ideas?
```
[rank0]: File "/workspace/train.py", line 230, in
[rank0]: train…
-
Running `mpirun --hostfile hosts -np 2 hostname`, both the processes are executed on the same host, and none on the other. I did verify that the hostfile is being detected (mpirun crashed when I delet…
-
Moved from #9
Hi Sungtae,
Hope you are doing well.
facing some error while installing Install zmq-manager on Centos 6.7
Below are the Error reports
src/res_zmq_manager.c: In function 'zmq_cmd_t…
-