OpenDriveLab / ST-P3

[ECCV 2022] ST-P3, an end-to-end vision-based autonomous driving framework via spatial-temporal feature learning.
Apache License 2.0
280 stars 34 forks source link

questions about training time #1

Closed EcustBoy closed 1 year ago

EcustBoy commented 1 year ago

Hi author~ I would like to ask how long your training time is. I used the same GPU configuration (V100 * 4) and the same training parameters (except for batch size,I set batch size =1 because of limitation of GPU) as in your paper, run the original code, train the whole end to end model directly without pretrained perception model.

Then I found that an epoch takes about 25h. Besides, as the epoch increases, the time-consuming of each iteration in one epoch also increases, The following picture shows the recording during the training process, it doesn't look normal. Is this possible because the pretrained perception model is not loaded?or batch_size =1 is too small? image

Can you offer some suggestions for improvement? thanks~

ilnehc commented 1 year ago

@EcustBoy It seems indeed abnormal. We can train the model in around 3 days, with 3-4s/it, even on the cluster which may hamper the training speed due to IO limitation. Could you try with other machines/GPUs to see if the problem still exits? You probably can try to profile the code to see which part takes that long time as well.