Long Training time for one epoch

duanyiqun / DiffusionDepth

PyTorch Implementation of introducing diffusion approach to 3D depth perception

https://arxiv.org/abs/2303.05021

Apache License 2.0

293 stars 16 forks source link

Long Training time for one epoch #24

Closed YBZh closed 1 year ago

YBZh commented 1 year ago

Thanks for your excellent work. Training the method on the Kitti dataset for one epoch with the provided script takes about 8 hours with 4 A6000 GPUs. Do you think this time cost is reasonable?

duanyiqun commented 1 year ago

Thank you very much for the question.
Though the first epoch might be slower than following epochs, general training time cost for diffusion based model is longer than traditional methods, especially with mpvit and swin backbone.

Our previous experiments are mostly conducted on 8x3090 and 8xA100, normally the first epoch will take 3h30min on 8*3090, later epochs will be slightly faster up to 2h50min for each epoch. If swich to res50 baseline, the time will be shorten by 2-3 times.

YBZh commented 1 year ago

Thanks for your quick reply. I found that the depth completion function https://github.com/duanyiqun/DiffusionDepth/blob/089bc5f7824ae962a261bfa0c0da3b5ab9d71184/src/data/kittidc.py#L269 makes the training slow. Considering that it is not used in the method, I commend them and then get the normal training speed.

duanyiqun commented 1 year ago

Oh, that's a very good point. I didn't noticed this part is opened in the release version. It should be closed before release. Thank you very much, I will make an update based on your notice.