Training memory requirement

Sense-X / Co-DETR

[ICCV 2023] DETRs with Collaborative Hybrid Assignments Training

MIT License

950 stars 100 forks source link

Training memory requirement #121

Closed SamedYalcin closed 4 months ago

SamedYalcin commented 5 months ago

Hi,

I'm trying to train your model on Kaggle with P100 w\16GB VRAM however I'm running out of memory. Can you share the memory requirements and if possible tips to reduce memory required?

Attached below is the model I'm trying to train. Instead of train.sh, I'm using train.py.

TempleX98 commented 5 months ago

We use A100 80G GPUs to train Swin-L and ViT-L models. You can reduce the image size to 1333x800 or freeze the backbone during training. Besides, some techniques such as FSDP and FP16, can help you reduce training memory consumption. Please refer to the latest mmdetection v3 for more details.

SamedYalcin commented 5 months ago

Reducing image size helps for a few batches but after a while it fails again. I will try freezing the backbone. About your last suggestion, does this repo work on MMDetectionV3?

TempleX98 commented 5 months ago

Sure, please refer to https://github.com/open-mmlab/mmdetection/tree/main/projects/CO-DETR

SamedYalcin commented 5 months ago

I wasn't able to find it under Model Zoo. Thanks for pointing out and thanks for the help.

MMDetection repo seems to use co_dino_5scale_swin_large_16e_o365tococo.pth instead of co_dino_5scale_swin_large_1x_coco.pth. Is this a mistake? co_dino_5scale_swin_large_16e_o365tococo.pth seems to use Object365 labels where as co_dino_5scale_swin_large_1x_coco.pth uses COCO labels. The config is for COCO.

SamedYalcin commented 5 months ago

Edited the comment.

TempleX98 commented 5 months ago

This config is used to finetune the Objects365 pretrained Swin-L on the COCO dataset. If you want to train this model on your custom dataset, I recommend using co_dino_5scale_swin_large_16e_o365tococo.pth for better performance.

SamedYalcin commented 5 months ago

For new comers:

Gradient checkpointing or reducing image size alone is not enough to train with 16GB of VRAM. I had to freeze the backbone completely, enable checkpointing and reduce the image size to 1333x800. Memory usage peaked around ~15GB.
Automatic Mixed Precision traning throws a runtime error. Maybe it's not supported?
I cannot comment of FSDP as I train on a single GPU and from my understading I need to use distributed training to enable FSDP.

Thanks for the help @TempleX98. Feel free to close the issue at your convenience.