cv516Buaa / tph-yolov5

GNU General Public License v3.0
718 stars 172 forks source link

Training with nano size #21

Open maarten0912 opened 2 years ago

maarten0912 commented 2 years ago

I am trying to train the the same model with a smaller network. I use the yolov5n.pt from the public repo and I created a yolov5n-xs-tph.yaml similar to yolov5l-xs-tph.yaml. It looks like this: (note I only changed the depth and width multiples)

# YOLOv5 🚀 by Ultralytics, GPL-3.0 license

# Parameters
nc: 80  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.25  # layer channel multiple
anchors: 4
  # - [10,13, 16,30, 33,23]  # P3/8
  # - [30,61, 62,45, 59,119]  # P4/16
  # - [116,90, 156,198, 373,326]  # P5/32

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]],  # 9
  ]

# YOLOv5 v6.0 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)

   [ -1, 1, Conv, [ 128, 1, 1 ] ],
   [ -1, 1, nn.Upsample, [ None, 2, 'nearest' ] ],
   [ [ -1, 2 ], 1, Concat, [ 1 ] ],  # cat backbone P2
   [ -1, 2, C3STR, [ 128, False ] ],  # 21 (P2/4-xsmall)

   [ -1, 1, Conv, [ 128, 3, 2 ] ],
   [ [ -1, 18, 4], 1, Concat, [ 1 ] ],  # cat head P3
   [ -1, 2, C3STR, [ 256, False ] ],  # 24 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14, 6], 1, Concat, [1]],  # cat head P4
   [-1, 2, C3STR, [512, False]],  # 27 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 2, C3STR, [1024, False]],  # 30 (P5/32-large)

   [[21, 24, 27, 30], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]

Doing exactly this for the yolov5s model worked for me and it trained fine, but with the yolov5n model I get this error:

Traceback (most recent call last):
  File "train.py", line 631, in <module>
    main(opt)
  File "train.py", line 528, in main
    train(opt.hyp, opt, device, callbacks)
  File "train.py", line 119, in train
    model = Model(cfg or ckpt['model'].yaml, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device)  # create
  File "~/tph-yolov5/models/yolo.py", line 104, in __init__
    self.model, self.save = parse_model(deepcopy(self.yaml), ch=[ch])  # model, savelist
  File "~/tph-yolov5/models/yolo.py", line 291, in parse_model
    m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args)  # module
  File "~/tph-yolov5/models/common.py", line 493, in __init__
    self.m = SwinTransformerBlock(c_, c_, c_//32, n)
  File "~/tph-yolov5/models/common.py", line 426, in __init__
    self.tr = nn.Sequential(*(SwinTransformerLayer(c2, num_heads=num_heads, window_size=window_size,  shift_size=0 if (i % 2 == 0) else self.shift_size ) for i in range(num_layers)))
  File "~/tph-yolov5/models/common.py", line 426, in <genexpr>
    self.tr = nn.Sequential(*(SwinTransformerLayer(c2, num_heads=num_heads, window_size=window_size,  shift_size=0 if (i % 2 == 0) else self.shift_size ) for i in range(num_layers)))
  File "/~/tph-yolov5/models/common.py", line 338, in __init__
    self.attn = WindowAttention(
  File "~/tph-yolov5/models/common.py", line 249, in __init__
    head_dim = dim // num_heads
ZeroDivisionError: integer division or modulo by zero

The error occurs when trying to create a C3STR block (# 21). I put these prints in:

c1=64
c2=32
n=1
shortcut=False
g=1
e=0.5
c_=16
num_heads of SwinTransformerBlock that will be created would be: 0

I know the problem has to do with my yolov5n-xs-tph.yaml file, but I don't understand what I should change. Again, for yolov5s-xs-tph.yaml it worked fine, with depth 0.33 and width 0.5... Any ideas?

cv516Buaa commented 2 years ago

The multi-head attention mechanism creates this error. The scaled small model has too few channels, so that the multi-head attention cannot correctly generate heads. Small models can not use the C3STR module. You may only use the C3TR module at the end of the backbone may help.