Total memory consumption for training with 32 batch size.

Alpha-VL / ConvMAE

ConvMAE: Masked Convolution Meets Masked Autoencoders

MIT License

484 stars 41 forks source link

Total memory consumption for training with 32 batch size. #23

Open IamYourAlpha opened 2 years ago

IamYourAlpha commented 2 years ago

I have tried training the convmae detector (as provided in this repository) with 2 GPUs with each 32GB (V-100). It looks like I can carry out training with only batch size = 2. Going beyond batch-size 2 raises CUDA out of memory. Also with such small batch size training does not seem to produce any well-trained model. Could you tell me the recommended memory size for training the model with batch size = 32?

Thank you so much.

gaopengpjlab commented 2 years ago

ConvMAE base detector with 32 batch size is trained by 8 A100 (80G). The GPU memory is approximately 40-50GB/GPU. You are recommend to use the updated ConvMAE Det with the following github link: https://github.com/OpenGVLab/Official-ConvMAE-Det

IamYourAlpha commented 2 years ago

Thanks for the github link.

IamYourAlpha commented 2 years ago

Will it be possible to share the compiled version of the custom detectron2 (as changes were made in this repository)?

gaopengpjlab commented 2 years ago

Our ConvMAE implementation do not modify detectron2 lib as shown below. https://github.com/OpenGVLab/Official-ConvMAE-Det/blob/main/projects/ConvMAEDet/modeling/convmae.py Official detectron2 lib should support ConvMAE.

IamYourAlpha commented 2 years ago

I see. It seems the source version of detectron2 is way ahead of the compiled version available.

gaopengpjlab commented 2 years ago

Please compile an environment support official ViTDet first.