Closed DFLyan closed 3 years ago
The horovod distributed training should work, how are you running it?
The horovod distributed training should work, how are you running it?
Thank you for your response. I have solved the problem. I haven't used the horovod module before, so I did not use "horovodrun -np 4 -H localhost:4" before "python train.py".
I want to train the network with several GPUs. I have seen the horovod module in the code, but it does not work. Whether I need to set another parameter to achieve my aim? Or I need to write the distributing training code extra.