The approximate training time for the video generation model

cure-lab / MagicDrive

[ICLR24] Official implementation of the paper “MagicDrive: Street View Generation with Diverse 3D Geometry Control”

https://gaoruiyuan.com/magicdrive/

GNU Affero General Public License v3.0

672 stars 40 forks source link

The approximate training time for the video generation model #52

Closed SecretMG closed 4 months ago

SecretMG commented 4 months ago

Hello,

Thank you for your excellent work! I would like to ask about the approximate training time for the video generation model. I followed the instructions from this link and used the command scripts/dist_train.sh 8 runner=8gpus_t +exp=rawbox_mv2.0t_0.4.3 for training. However, it took me over 30 hours to train for 5000 steps. I would like to know if this is normal because it was mentioned that approximately 80,000 steps are needed for training, which would take a considerable amount of time.

Thank you very much for your help!

flymin commented 4 months ago

On our V100 servers, it takes about 13h for 5000 steps. Please consider using the map cache as described here for speedup and use as many CPU workers as you can.

SecretMG commented 4 months ago

So may I ask how long did the baseline model provided in the pic take to train for roughly? I also wonder is the baseline model trained with 61 frames with sweeps and generated annotations or just 16 frames?

flymin commented 4 months ago

It takes about 10 days to train with 8V100. If you have more GPUs or larger GPU mem, the training should take shorter time by adjusting the batch size and learning rate.

The baseline model only train with 16-frame generation. We train another model for 61-frame generation with the new configuration.