WongKinYiu / yolor

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks (https://arxiv.org/abs/2105.04206)
GNU General Public License v3.0
1.98k stars 524 forks source link

non_max_suppression causes CUDA out of memory. #41

Open ixez opened 2 years ago

ixez commented 2 years ago

I am training yolor with crowdhuman dataset. Even the train and test batch sizes are set to 1, the program still encounters memory problems when testing (I got 12G memory). I found it is due to non_max_suppression, anyone has the same problem?

Traceback (most recent call last):
  File "train.py", line 537, in <module>
    train(hyp, opt, device, tb_writer, wandb)
File "train.py", line 336, in train
results, maps, times = test.test(opt.data,
File "/workplace/Codes/yolor/test.py", line 134, in test
    output = non_max_suppression(inf_out, conf_thres=conf_thres, iou_thres=iou_thres)
File "/workplace/Codes/yolor/utils/general.py", line 341, in non_max_suppression
i = torch.ops.torchvision.nms(boxes, scores, iou_thres)
RuntimeError: CUDA out of memory. Tried to allocate 10.49 GiB (GPU 0; 10.91 GiB total capacity; 618.68 MiB already allocated; 6.45 GiB free; 3.78 GiB reserved in total by PyTorch) 
trungpham2606 commented 2 years ago

@ixez I have the same problem with you, but then I can train the model by setting the value of rect in here to store_false. image But the problem is after nearly 100 epochs; during inference, the model outputs nothing at all.

WongKinYiu commented 2 years ago

you could increase epoch number for start test https://github.com/WongKinYiu/yolor/blob/main/train.py#L335

or pass conf_thres=0.1 into test function https://github.com/WongKinYiu/yolor/blob/main/train.py#L336

trungpham2606 commented 2 years ago

Dear @WongKinYiu , I have a question, the rect option is set to False as default, so what is the difference between setting it to False or True ? And when set it to False, I cant train the yolor because of OOM. I had set the conf_thres=0.0001, the results are so random. The loss during training was around 0.02

WongKinYiu commented 2 years ago

--rect is not suggested to use due to it do not shuffle data in default. with --rect: no mosaic augmentation, so the gt will be ~1/4, and the memory used for compute loss will reduce to 1/4.

trungpham2606 commented 2 years ago

@WongKinYiu So with 11Gb memory of card, how can I do to be able to train the yolor ?

ixez commented 2 years ago

@WongKinYiu Thanks for the reply. Starting testing early doesn't solve the problem, but delays it. image

Any other insights about the cause?

ixez commented 2 years ago

@WongKinYiu Regarding the second advice, is the problem caused by too many bboxes when testing?

janchk commented 2 years ago

@WongKinYiu Regarding the second advice, is the problem caused by too many bboxes when testing?

Yes. I figured out that it is. To fix this you could uncomment this lline https://github.com/WongKinYiu/yolor/blob/b168a4dd0fe22068bb6f43724e22013705413afb/utils/general.py#L336 And add some clipping for reasonable amount of predictions for example x = x[0:10000, :]