autonomousvision / carla_garage

[ICCV'23] Hidden Biases of End-to-End Driving Models
MIT License
203 stars 16 forks source link

Training Time #16

Closed wheltz closed 9 months ago

wheltz commented 10 months ago

Thank you for your team's work. It may be because the issue from transfuser repository mentioned that the model only needs to be trained for 1 day with 8 2080ti, so I thought that the training of transfuser++ with little architectural change would only take about the same time, but when I use an a100(40G) to train the NC model, it seems that it takes 6 days. I want to know the specific training time of transfuser++, it doesn’t seem to be mentioned in the paper.

Kait0 commented 10 months ago

As discussed in section 3.4 in the paper, TransFuser++ uses 6x larger training budget than TransFuser (2x due to 2 stage training and 3x due to using 3x more data). An additional compute increase that might be not quite as apparent is that we use the larger camera image from the TCP paper. This increases forward and backward time in the CNN for both the reproduced TransFuser and TransFuser++ (compared to the old repository).

Don't remember exactly how long it took to train TF++ but something on the order of 3 days on 4x A100 (40G), so your training time on 1 A100 seems reasonable assuming you are training in the scaled setting.

wheltz commented 9 months ago

That makes sense, thank you again