I used my data abt 1000h speech and trained on 8x Tesla V100-16GB using horovod and mixed precision. With batch_size_per_gpu equals to 32, the training time is abt 3.2s per step, it took 3.5h for 1 epoch. Is it expected? Can I reduce the training time?
I used my data abt 1000h speech and trained on 8x Tesla V100-16GB using horovod and mixed precision. With batch_size_per_gpu equals to 32, the training time is abt 3.2s per step, it took 3.5h for 1 epoch. Is it expected? Can I reduce the training time?