Closed Kadakol closed 2 years ago
From the log it seems that you are using 1 GPU? I think the problem may result from the unstable gradient. In our experiments, we use 8 GPU with batch size = 1 for each GPU. The batch size could be too small if only one GPU is used.
Maybe you can try increase the batch size to 2 and reduce the length from 15 to 8 to see if it is better.
Hi @ckkelvinchan . You're right. I am using 1 GPU for my training.
Let me try out your suggestion and see if it helps. Thank you!
Hi @ckkelvinchan . You're right. I am using 1 GPU for my training.
Let me try out your suggestion and see if it helps. Thank you!
dose the larger batchsize helps?
I have followed the instructions listed in the README and completed the training. However, I ran into a few issues during the training.
No changes have been made to the source code. The code commit ID used is fa3d3284664b05341867f51149c12e10a002fc0f from Jan 17, 2022.
Could you please let me know how this can be resolved? Also please let me know if any more information is required from my side in order to debug this.