ashkamath / mdetr

Apache License 2.0
969 stars 125 forks source link

Cuda error: out of memory when finetuning on the refcoco dataset #65

Closed xuliwalker closed 2 years ago

xuliwalker commented 2 years ago

Hi

Thanks for your great work! When I finetune the mdetr model on the refcoco dataset by using the pretrained efficientnetb3 backbone, I occasionally met the OOM error during training, which can happend in 1st epoch or 2nd eopch. I wonder is there any way to reduce the memory cost for finetuning to stabilize the training process. I use the 32G V100 GPUs and I have set the related environment variables. Thanks!

ashkamath commented 2 years ago

Hi, When finetuning on the Refcoco dataset, we used batch size 4 on 2 nodes having rtx8000 cards which had 48gb memory. You can instead use batch size 2 and scale up the number of gpus to have the same effective batch size if youre running it on the V100 gpus. Hope this helps!

Best, Aishwarya