WongKinYiu / yolor

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks (https://arxiv.org/abs/2105.04206)
GNU General Public License v3.0
1.99k stars 518 forks source link

--augment causes image to give error on resizing #221

Open w013nad opened 2 years ago

w013nad commented 2 years ago

Solution at bottom

I am trying to run the detect.py routine with the augment option. However, whenever it tries to resize the images, I get an error which states that the image resizing is not working

Recent call last):
  File "detect.py", line 176, in <module>
    detect()
  File "detect.py", line 71, in detect
    pred = model(img, augment=opt.augment)[0]
  File "C:\Anaconda3\envs\yolov5\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\*****\code\yolor-paper\models\yolo.py", line 167, in forward
    yi = self.forward_once(xi)[0]  # forward
  File "C:\*****\code\yolor-paper\models\yolo.py", line 193, in forward_once
    x = m(x)  # run
  File "C:\Anaconda3\envs\yolov5\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "C:\*****\code\yolor-paper\models\common.py", line 774, in forward
    return torch.cat(x, self.d)
RuntimeError: Sizes of tensors must match except in dimension 2. Got 108 and 107 (The offending index is 0)

However, this is easily remedied by adjusting the "scale_img" function in utils.torch_utils, currently, you have the grid size set to 32. However, yolor uses a grid size of 64. Sometimes, this will cause the resulting image to be the wrong size for the model and give this error. In order to fix this, simply change "gs=64" and the models will run properly with the --augment option.

This is not a problem normally as there are other checks in place if --imgsz is not set properly but those same checks are not in place for the --augment option

def scale_img(img, ratio=1.0, same_shape=False):  # img(16,3,256,416), r=ratio
    # scales img(bs,3,y,x) by ratio
    if ratio == 1.0:
        return img
    else:
        h, w = img.shape[2:]
        s = (int(h * ratio), int(w * ratio))  # new size
        img = F.interpolate(img, size=s, mode='bilinear', align_corners=False)  # resize
        if not same_shape:  # pad/crop img
            gs = 64  # (pixels) grid size THIS IS THE ONLY LINE I CHANGED
            h, w = [math.ceil(x * ratio / gs) * gs for x in (h, w)]
        return F.pad(img, [0, w - s[1], 0, h - s[0]], value=0.447)  # value = imagenet mean