WXinlong / SOLO

SOLO and SOLOv2 for instance segmentation, ECCV 2020 & NeurIPS 2020.
Other
1.69k stars 307 forks source link

RuntimeError: CUDA out of memory occurred when testing #179

Closed zhuaiyi closed 2 years ago

zhuaiyi commented 3 years ago

command

python tools/test_ins.py configs/solov2/solov2_light_448_r34_fpn_8gpu_3x.py  work_dirs/solov2_light_release_r34_fpn_8gpu_3x/epoch_36.pth --show --out  results_solo.pkl
 --eval segm

bug [>>>>>>>>>>>>> ] 20/76, 0.3 task/s, elapsed: 59s, ETA: 165s Traceback (most recent call last): ... RuntimeError: CUDA out of memory. Tried to allocate 3.30 GiB (GPU 0; 8.00 GiB total capacity; 973.14 MiB already allocated; 2.13 GiB free; 3.74 GiB reserved in total by PyTorch) Then I shrinked my test set to 14 images, same error occurred when [>> ] 2/14.

Environment python 3.7 CUDA 11.1 PyTorch 1.7.0+cu110

Supplement The epoch_36.pth file is generated from the training on my own dataset. And performed pretty good when single-tested by inference_demo.py but fail with this batch-test command.

WXinlong commented 3 years ago

@zhuaiyi You can reduce the number of objects in post-process, e.g., to set a smaller MODEL.SOLOV2.NMS_PRE. Or, move this sort and select part ahead, after the model prediction. For example, move Line440~448 to Line 410 of solov2.py, with minimal modifications, e.g., make sure you modify the names and didn't miss any variables.

zhuaiyi commented 3 years ago

@zhuaiyi You can reduce the number of objects in post-process, e.g., to set a smaller MODEL.SOLOV2.NMS_PRE. Or, move this sort and select part ahead, after the model prediction. For example, move Line440~448 to Line 410 of solov2.py, with minimal modifications, e.g., make sure you modify the names and didn't miss any variables.

Thanks very much! I'll get to work on it.

xuhao-anhe commented 3 years ago

@zhuaiyi You can reduce the number of objects in post-process, e.g., to set a smaller MODEL.SOLOV2.NMS_PRE. Or, move this sort and select part ahead, after the model prediction. For example, move Line440~448 to Line 410 of solov2.py, with minimal modifications, e.g., make sure you modify the names and didn't miss any variables.

Thanks very much! I'll get to work on it. 你好,请问你解决了吗

zhuaiyi commented 3 years ago

@zhuaiyi You can reduce the number of objects in post-process, e.g., to set a smaller MODEL.SOLOV2.NMS_PRE. Or, move this sort and select part ahead, after the model prediction. For example, move Line440~448 to Line 410 of solov2.py, with minimal modifications, e.g., make sure you modify the names and didn't miss any variables.

Thanks very much! I'll get to work on it. 你好,请问你解决了吗

我回去看了下,作者给出的解决方式是基于AdelaiDet框架的,我用的是mmdet,后来通过修改配置文件的test_pipeline下的img_scale之后测试成功了

kizoooh commented 2 years ago

@zhuaiyi You can reduce the number of objects in post-process, e.g., to set a smaller MODEL.SOLOV2.NMS_PRE. Or, move this sort and select part ahead, after the model prediction. For example, move Line440~448 to Line 410 of solov2.py, with minimal modifications, e.g., make sure you modify the names and didn't miss any variables.

Thanks very much! I'll get to work on it. 你好,请问你解决了吗

我回去看了下,作者给出的解决方式是基于AdelaiDet框架的,我用的是mmdet,后来通过修改配置文件的test_pipeline下的img_scale之后测试成功了

您好,想问一下您在采用inference_demo.py来批量推理图片时GPU占用特别大,您有什么解决办法吗?