Yuxiang1995 / ICDAR2021_MFD

1st Solution For ICDAR 2021 Competition on Mathematical Formula Detection(公式检测冠军方案)
Apache License 2.0
128 stars 42 forks source link

RuntimeError: CUDA OOM / Bad Training Performance after reducing crop_size #9

Closed AaaJrwp4 closed 1 year ago

AaaJrwp4 commented 2 years ago

Hello @Yuxiang1995 ,

I am attempting to train your Network on a single gpu. But then I get this Error:

RuntimeError: CUDA out of memory. Tried to allocate 1.86 GiB   
(GPU 0; 4.00 GiB total capacity; 1.48 GiB already allocated; 319.91 MiB free; 1.88   
 GiB reserved in total by PyTorch) 

I also tried it with a 10 GiB GPU but still the same error. ... well I am able to train it when I reduce the crop_size in conigs/_base_/datasets/formula_detection.py But it seem the model doesnt learn anything, since the loss doesnt get smaller.
... and I saw in your presentation that the large crop_size is a feature of your model.

Can you give me hint how to sucessfully train the model on a single gpu, i.e get rid of the CUDA OOM Error ?

siriasadeddin commented 1 year ago

Hi, I tried reducing the input image size, which will make it run but now I have problems because the loss is not converging (as others said in several issues)

Yuxiang1995 commented 1 year ago

11