below command for training with horovod provided in the instruction is for single machine multi-gpu
mpiexec -np python run.py --config_file=... --mode=train_eval --use_horovod=True --enable_logs
I am wondering how to do distributed training with horovod using multiple machines each with several GPU cards.
Hi,
below command for training with horovod provided in the instruction is for single machine multi-gpu mpiexec -np python run.py --config_file=... --mode=train_eval --use_horovod=True --enable_logs
I am wondering how to do distributed training with horovod using multiple machines each with several GPU cards.
thanks