Stage 2 training - Githubissues

dotchen / WorldOnRails

(ICCV 2021, Oral) RL and distillation in CARLA using a factorized world model

https://dotchen.github.io/world_on_rails/

MIT License

167 stars 29 forks source link

Stage 2 training #26

Closed varunjammula closed 3 years ago

varunjammula commented 3 years ago

Hi, I was wondering how much time it took to train the main model with 1M frames. Currently, I am using 4 GPU, with 128 batch size. Currently, one epoch took 20 hr with 35 % complete.

Will increasing the batch size help for faster computation or would it affect the training as well?

dotchen commented 3 years ago

~~I used 4 Titans (Pascal) with roughly 4 days to train the leaderboard model with the default batch size.~~

EDIT: Double-checked it's 2 Titans, I got it confused with the project I am currently working on.

Regarding training speed: if it is slower than expected check if your disk I/O is bottleneck. I recommend storing data on SSDs as I found it to be the fastest with lmdb.

Here is a training log of act_loss for reference:

W B Chart 7_27_2021, 1_01_33 PM