Closed YangJae96 closed 2 years ago
The gpu number, batchsize and learning rate can always affect results, because we use DataParellel mode.
I used 4 GPUs to train. Thus, the batchsize and learning rate values in our paper are tuned based on 4 GPUs.
If you have enough GPUs, try to use 4 GPUs. Otherwise, you may need to search optimal batchsize and learning rate values to find best results. Wish this can help you.
Why does the performance degrade when I use only 1 GPU with all the same hyper-parameter?
How should I change the parameters when using 1 GPU.