Alpha-VL / ConvMAE

ConvMAE: Masked Convolution Meets Masked Autoencoders
MIT License
483 stars 41 forks source link

Time required to train one epoch. #5

Closed charlesCXK closed 2 years ago

charlesCXK commented 2 years ago

Dear author: Thank you for sharing the excellent work! May I ask how the time overhead of ConvMAE pre-training compares to MAE? Can you provide the time required to train an epoch for these two methods on the same type of GPU?

gaopengpjlab commented 2 years ago

Thanks for your suggestion. I will answer your question in a few days.

gaopengpjlab commented 2 years ago

hardware setup : 8 * A6000 with per GPU holding 128 images

MAE 0.4084s per iteration, GPU memory printed by nvidia-smi command : 17926MB

ConvMAE 0.8306s per iteration, GPU memory printed by nvidia-smi command : 27049MB

ConvMAE by skipping masked region computation in stage 1 and stage 2 0.5480s per iteration, GPU memory printed by nvidia-smi command : 21250MB

Thank you so much for reminding us about training speed comparison. We will include speed/GPU memory/FLOPs comparison in updated version.

charlesCXK commented 2 years ago

Thanks for your detailed reply!

gaopengpjlab commented 2 years ago

By default, ConvMAE stands for ConvMAE + Multi-scale Decoder proposed in our paper.

gaopengpjlab commented 2 years ago

We are going to release Fast ConvMAE which can significantly accelerate the pretraining of ConvMAE in a few days. https://github.com/Alpha-VL/FastConvMAE

gaopengpjlab commented 2 years ago

Fast ConvMAE have been released which accelerates the pretraining time by half.