Adapt for yolov5l6 - Githubissues

ramdhan1989 commented 1 year ago

Hi, would you mind suggesting me on how to adapt yolov5l6 using DRENet? below is the configuration for yolov5l6.

# YOLOv5 🚀 by Ultralytics, GPL-3.0 license

# Parameters
nc: 80  # number of classes
depth_multiple: 1.0  # model depth multiple
width_multiple: 1.0  # layer channel multiple
anchors:
  - [19,27,  44,40,  38,94]  # P3/8
  - [96,68,  86,152,  180,137]  # P4/16
  - [140,301,  303,264,  238,542]  # P5/32
  - [436,615,  739,380,  925,792]  # P6/64

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [768, 3, 2]],  # 7-P5/32
   [-1, 3, C3, [768]],
   [-1, 1, Conv, [1024, 3, 2]],  # 9-P6/64
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]],  # 11
  ]

# YOLOv5 v6.0 head
head:
  [[-1, 1, Conv, [768, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 8], 1, Concat, [1]],  # cat backbone P5
   [-1, 3, C3, [768, False]],  # 15

   [-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 19

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, C3, [256, False]],  # 23 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 20], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [512, False]],  # 26 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 16], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [768, False]],  # 29 (P5/32-large)

   [-1, 1, Conv, [768, 3, 2]],
   [[-1, 12], 1, Concat, [1]],  # cat head P6
   [-1, 3, C3, [1024, False]],  # 32 (P6/64-xlarge)

   [[23, 26, 29, 32], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5, P6)
  ]

Thank you

WindVChen commented 1 year ago

Maybe you can try this.

(Just refer to the current DRENet.yaml template: remove the PANet part, replace C3 to C3ResAtnMHSA, add RCAN, modify the order index.)

By the way, modify the input size of C3ResAtnMHSA according to your actual input.

# YOLOv5 🚀 by Ultralytics, GPL-3.0 license

# Parameters
nc: 80  # number of classes
depth_multiple: 1.0  # model depth multiple
width_multiple: 1.0  # layer channel multiple
anchors:
  - [19,27,  44,40,  38,94]  # P3/8
  - [96,68,  86,152,  180,137]  # P4/16
  - [140,301,  303,264,  238,542]  # P5/32
  - [436,615,  739,380,  925,792]  # P6/64

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [768, 3, 2]],  # 7-P5/32
   [-1, 3, C3, [768]],
   [-1, 1, Conv, [1024, 3, 2]],  # 9-P6/64
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]],  # 11
   [-1, 3, C3ResAtnMHSA, [1024, 8, False]],  # 12
  ]

# YOLOv5 v6.0 head
head:
  [[-1, 1, Conv, [768, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 8], 1, Concat, [1]],  # cat backbone P5
   [-1, 3, C3ResAtnMHSA, [768, 16, False]],  # 16

   [-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3ResAtnMHSA, [512, 32, False]],  # 20

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, C3ResAtnMHSA, [256, 64, False]],  # 24 (P3/8-small)

   [21, 3, C3ResAtnMHSA, [512, 32, False]],  # 25 (P3/8-small)

   [17, 3, C3ResAtnMHSA, [1024, 16, False]],  # 26 (P3/8-small)

   [13, 3, C3ResAtnMHSA, [1536, 8, False]],  # 27 (P3/8-small)

   [4, 1, RCAN, []],
   [[24, 25, 26, 27], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5, P6)
  ]

ramdhan1989 commented 1 year ago

Hi, I created exact yaml file as you mentioned above. Then, add SPPF class into common.py. my input size is 512. I got this error:

                from  n    params  module                                  arguments
  0                -1  1      7040  models.common.Conv                      [3, 64, 6, 2, 2]
  1                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]
  2                -1  1    156928  models.common.C3                        [128, 128, 3]
  3                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]
  4                -1  1   1118208  models.common.C3                        [256, 256, 6]
  5                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]
  6                -1  1   6433792  models.common.C3                        [512, 512, 9]
  7                -1  1   3540480  models.common.Conv                      [512, 768, 3, 2]
  8                -1  1   5611008  models.common.C3                        [768, 768, 3]
  9                -1  1   7079936  models.common.Conv                      [768, 1024, 3, 2]
 10                -1  1   9971712  models.common.C3                        [1024, 1024, 3]
 11                -1  1    535562  models.common.SPPF                      [1024, 5]
 12                -1  1   3496704  models.common.C3ResAtnMHSA              [1024, 1024, 3, 8, False]
 13                -1  1    787968  models.common.Conv                      [1024, 768, 1, 1]
 14                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']
 15           [-1, 8]  1         0  models.common.Concat                    [1]
 16                -1  1   2570304  models.common.C3ResAtnMHSA              [1536, 768, 3, 16, False]
 17                -1  1    394240  models.common.Conv                      [768, 512, 1, 1]
 18                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']
 19           [-1, 6]  1         0  models.common.Concat                    [1]
 20                -1  1   1160576  models.common.C3ResAtnMHSA              [1024, 512, 3, 32, False]
 21                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]
 22                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']
 23           [-1, 4]  1         0  models.common.Concat                    [1]
 24                -1  1    309952  models.common.C3ResAtnMHSA              [512, 256, 3, 64, False]
 25                21  1    767360  models.common.C3ResAtnMHSA              [256, 512, 3, 32, False]
 26                17  1   2984704  models.common.C3ResAtnMHSA              [512, 1024, 3, 16, False]
 27                13  1   6670464  models.common.C3ResAtnMHSA              [768, 1536, 3, 8, False]
 28                 4  1   4738070  models.common.RCAN                      [256]
 29  [24, 25, 26, 27]  1     59976  models.yolo.Detect                      [1, [[19, 27, 44, 40, 38, 94], [96, 68, 86, 152, 180, 137], [140, 301, 303, 264, 238, 542], [436, 615, 739, 380, 925, 792]], [256, 512, 1024, 1536]]
Traceback (most recent call last):
  File "train.py", line 515, in <module>
    train(hyp, opt, device, tb_writer, wandb)
  File "train.py", line 84, in train
    model = Model(opt.cfg, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device)  # create
  File "D:\DRENet\models\yolo.py", line 99, in __init__
    m.stride = torch.tensor([s / x.shape[-2] for x in self.forward(torch.zeros(1, ch, s, s))[0]])  # forward
  File "D:\DRENet\models\yolo.py", line 131, in forward
    return self.forward_once(x, profile)  # single-scale inference, train
  File "D:\DRENet\models\yolo.py", line 148, in forward_once
    x = m(x)  # run
  File "C:\Users\Owner\anaconda3\envs\slick\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\DRENet\models\common.py", line 193, in forward
    return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
  File "C:\Users\Owner\anaconda3\envs\slick\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\DRENet\models\common.py", line 37, in forward
    return self.act(self.bn(self.conv(x)))
  File "C:\Users\Owner\anaconda3\envs\slick\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Users\Owner\anaconda3\envs\slick\lib\site-packages\torch\nn\modules\conv.py", line 447, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "C:\Users\Owner\anaconda3\envs\slick\lib\site-packages\torch\nn\modules\conv.py", line 444, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [512, 1024, 1, 1], expected input[1, 5, 8, 8] to have 1024 channels, but got 5 channels instead

Please advise,

thanks

WindVChen commented 1 year ago

The error seems to come from the inconsistency between the output dimension of last module and the input dimension of next module. However, based on the current information, I have no idea about the exact part that causes the error. You may add some breakpoints in the modules' output to ensure the consistency of input/output size.

ramdhan1989 commented 1 year ago

This is the flow of data through the layers: I think the red modules are incompatible in size.

please advise thanks

WindVChen commented 1 year ago

Have you modified the parse_model( ) function in yolo.py? If you add an new SPPF class, you may add the class name the same as here.

ramdhan1989 commented 1 year ago

Thanks a lot, it works!

WindVChen / DRENet

Adapt for yolov5l6 #11