Sipmask:mmdetection CUDA out of memory error

YYlvsy commented 3 years ago

Hi,when training with the original code,I got RuntimeError as below:

pred_masks = torch.stack([pos_masks00, pos_masks01, pos_masks10, pos_masks11], dim=0)
RuntimeError: CUDA out of memory. Tried to allocate 464.00 MiB (GPU 1; 10.76 GiB total capacity; 7.33 GiB already alloca
ted; 97.19 MiB free; 1.19 GiB cached)

There was no one using GPU when I trained it. First I thought it might be a batchsize issue. But even though I changed the batch size from 16 to 1,the same error still occured .The only difference between batchsize 16 and batchsize 1 is ,when changed to batchsize 1,the training was able to run for half epoch,then the error occurred.

I wonder if the code didn't clear the grad during training .

BTW,my GPU is GeForce RTX 2080 Ti ,11019M.During training I used 2 GPUs. My environment is pytorch = 1.1.0 torchvision = 0.3.0 mmcv = 0.4.3

Please tell me how to deal with it.Thanks a lot!

JialeCao001 commented 3 years ago

Generally, it may need some more memory if for image instance segmentation. If you want to reduce the memory, maybe you can try use 2x upsampling about the basis mask.

YYlvsy commented 3 years ago

@JialeCao001 I see.Thanks for your reply!I noticed that you used 8GPUs during training.Could you tell me what the memory size of per GPU?

JialeCao001 commented 3 years ago

I remember about maybe 20G per GPU. If we limit the number of proposals for mask prediction per GPU, it maybe less than 20G.

YYlvsy commented 3 years ago

I remember about maybe 20G per GPU. If we limit the number of proposals for mask prediction per GPU, it maybe less than 20G.

Thank you very much for your prompt reply!It really helps me a lot.

JialeCao001 / SipMask

Sipmask:mmdetection CUDA out of memory error #32