Closed varunjammula closed 3 years ago
I used 4 Titans (Pascal) with roughly 4 days to train the leaderboard model with the default batch size.
EDIT: Double-checked it's 2 Titans, I got it confused with the project I am currently working on.
Regarding training speed: if it is slower than expected check if your disk I/O is bottleneck. I recommend storing data on SSDs as I found it to be the fastest with lmdb.
Here is a training log of act_loss
for reference:
Hi, I was wondering how much time it took to train the main model with 1M frames. Currently, I am using 4 GPU, with 128 batch size. Currently, one epoch took 20 hr with 35 % complete.
Will increasing the batch size help for faster computation or would it affect the training as well?