Small object under 16px detection

CloudRider-pixel commented 2 years ago

For yolov4, @AlexeyAB suggests to do the following modification in order to detect objects smaller 16px: " for training for small objects (smaller than 16x16 after the image is resized to 416x416) - set layers = 23 instead of

[darknet/cfg/yolov4.cfg](https://github.com/AlexeyAB/darknet/blob/6f718c257815a984253346bba8fb7aa756c55090/cfg/yolov4.cfg#L895)

Line 895 in [6f718c2](https://github.com/AlexeyAB/darknet/commit/6f718c257815a984253346bba8fb7aa756c55090)
layers = 54 

set stride=4 instead of

[darknet/cfg/yolov4.cfg](https://github.com/AlexeyAB/darknet/blob/6f718c257815a984253346bba8fb7aa756c55090/cfg/yolov4.cfg#L892)

Line 892 in [6f718c2](https://github.com/AlexeyAB/darknet/commit/6f718c257815a984253346bba8fb7aa756c55090)

stride=2 
set stride=4 instead of

[darknet/cfg/yolov4.cfg](https://github.com/AlexeyAB/darknet/blob/6f718c257815a984253346bba8fb7aa756c55090/cfg/yolov4.cfg#L989)

Line 989 in [6f718c2](https://github.com/AlexeyAB/darknet/commit/6f718c257815a984253346bba8fb7aa756c55090)
stride=2

" Which make a significant improvement for my use-case (single class & small object ) .

Is there a way to do the same trick with yolov7 (and even yolov7-tiny )?

I train with : python3 train.py --weights yolov7_training.pt --data "custom.yaml" --workers 4 --batch-size 4 --img 1024 --cfg cfg/training/yolov7.yaml --name yolov7 --hyp data/hyp.scratch.p5.yaml --epochs 200

For now I don't achieve to get the same mAP@0.5 than yolov4 and I use: Use auto anchors (3,3, 5,4, 7,5, 6,6, 8,7, 10,10, 13,12, 18,18, 27,37) Use Adam Set learning rate to 0.001 warmup at 0 Disable mosaic & mixup Set scale to 0.1

In advance thanks.

loveq007 commented 2 years ago

I also meet this problem, very poor result on small object.

AlexeyAB commented 2 years ago

Try:

[-1, 1, nn.Upsample, [None, 4, 'nearest']], https://github.com/WongKinYiu/yolov7/blob/main/cfg/training/yolov7.yaml#L78

[24, 1, Conv, [256, 1, 1]], https://github.com/WongKinYiu/yolov7/blob/main/cfg/training/yolov7.yaml#L79

[11, 1, Conv, [128, 1, 1]], https://github.com/WongKinYiu/yolov7/blob/main/cfg/training/yolov7.yaml#L93

loveq007 commented 2 years ago

Thank you for your reply! I try this, but got an error:

Sizes of tensors must match except in dimension 1. Expected size 16 but got size 8 for tensor number 2 in the list. File "D:\Code\DLProject\pytorch-yolov7-WongKinYiu\models\common.py", line 64, in forward return torch.cat(x, dim=self.d) File "D:\Code\DLProject\pytorch-yolov7-WongKinYiu\models\yolo.py", line 613, in forward_once x = m(x) # run File "D:\Code\DLProject\pytorch-yolov7-WongKinYiu\models\yolo.py", line 587, in forward return self.forward_once(x, profile) # single-scale inference, train File "D:\Code\DLProject\pytorch-yolov7-WongKinYiu\models\yolo.py", line 532, in init m.stride = torch.tensor([s / x.shape[-2] for x in self.forward(torch.zeros(1, ch, s, s))]) # forward File "D:\Code\DLProject\pytorch-yolov7-WongKinYiu\train.py", line 88, in train

AlexeyAB commented 2 years ago

Try this cfg file: yolov7_so.zip

I used

   [-1, 1, nn.Upsample, [None, 4, 'nearest']],
   [11, 1, Conv, [128, 1, 1]], # route backbone P3

instead of https://github.com/WongKinYiu/yolov7/blob/c14ba0c297b3b5fc0374c917db798c88f9dd226c/cfg/training/yolov7.yaml#L92-L93

and [-1, 1, Conv, [128, 3, 4]], instead of https://github.com/WongKinYiu/yolov7/blob/c14ba0c297b3b5fc0374c917db798c88f9dd226c/cfg/training/yolov7.yaml#L108

CloudRider-pixel commented 2 years ago

Hi @loveq007,

I did the same for yolov7-tiny and I get better results:

     [-1, 1, nn.Upsample, [None, 4, 'nearest']], # Upsample 4 instead of 2
     [7, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 7 instead of 14

Instead of:

https://github.com/WongKinYiu/yolov7/blob/892b603718cda9f118368a07dab700b5309c2ac5/cfg/training/yolov7-tiny.yaml#L76 https://github.com/WongKinYiu/yolov7/blob/892b603718cda9f118368a07dab700b5309c2ac5/cfg/training/yolov7-tiny.yaml#L77

and

    [-1, 1, Conv, [128, 3, 4, None, 1, nn.LeakyReLU(0.1)]], # stride 2 to 4

instead of

https://github.com/WongKinYiu/yolov7/blob/892b603718cda9f118368a07dab700b5309c2ac5/cfg/training/yolov7-tiny.yaml#L87

YaoQ commented 2 years ago

Try this cfg file: yolov7_so.zip

I used
   [-1, 1, nn.Upsample, [None, 4, 'nearest']],
   [11, 1, Conv, [128, 1, 1]], # route backbone P3
instead of

https://github.com/WongKinYiu/yolov7/blob/c14ba0c297b3b5fc0374c917db798c88f9dd226c/cfg/training/yolov7.yaml#L92-L93

and [-1, 1, Conv, [128, 3, 4]], instead of

https://github.com/WongKinYiu/yolov7/blob/c14ba0c297b3b5fc0374c917db798c88f9dd226c/cfg/training/yolov7.yaml#L108

when I verify the yolov7_so.yaml, then get some errors:

python models/yolo.py --config ./cfg/training/yolov7_so.yaml 

                from  n    params  module                                  arguments                     
  0                -1  1       928  models.common.Conv                      [3, 32, 3, 1]                 
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     36992  models.common.Conv                      [64, 64, 3, 1]                
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  1      8320  models.common.Conv                      [128, 64, 1, 1]               
  5                -2  1      8320  models.common.Conv                      [128, 64, 1, 1]               
  6                -1  1     36992  models.common.Conv                      [64, 64, 3, 1]                
  7                -1  1     36992  models.common.Conv                      [64, 64, 3, 1]                
  8                -1  1     36992  models.common.Conv                      [64, 64, 3, 1]                
  9                -1  1     36992  models.common.Conv                      [64, 64, 3, 1]                
 10  [-1, -3, -5, -6]  1         0  models.common.Concat                    [1]                           
 11                -1  1     66048  models.common.Conv                      [256, 256, 1, 1]              
 12                -1  1         0  models.common.MP                        []                            
 13                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 14                -3  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 16          [-1, -3]  1         0  models.common.Concat                    [1]                           
 17                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 18                -2  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 19                -1  1    147712  models.common.Conv                      [128, 128, 3, 1]              
 20                -1  1    147712  models.common.Conv                      [128, 128, 3, 1]              
 21                -1  1    147712  models.common.Conv                      [128, 128, 3, 1]              
 22                -1  1    147712  models.common.Conv                      [128, 128, 3, 1]              
 23  [-1, -3, -5, -6]  1         0  models.common.Concat                    [1]                           
 24                -1  1    263168  models.common.Conv                      [512, 512, 1, 1]              
 25                -1  1         0  models.common.MP                        []                            
 26                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 27                -3  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 28                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 29          [-1, -3]  1         0  models.common.Concat                    [1]                           
 30                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 31                -2  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 32                -1  1    590336  models.common.Conv                      [256, 256, 3, 1]              
 33                -1  1    590336  models.common.Conv                      [256, 256, 3, 1]              
 34                -1  1    590336  models.common.Conv                      [256, 256, 3, 1]              
 35                -1  1    590336  models.common.Conv                      [256, 256, 3, 1]              
 36  [-1, -3, -5, -6]  1         0  models.common.Concat                    [1]                           
 37                -1  1   1050624  models.common.Conv                      [1024, 1024, 1, 1]            
 38                -1  1         0  models.common.MP                        []                            
 39                -1  1    525312  models.common.Conv                      [1024, 512, 1, 1]             
 40                -3  1    525312  models.common.Conv                      [1024, 512, 1, 1]             
 41                -1  1   2360320  models.common.Conv                      [512, 512, 3, 2]              
 42          [-1, -3]  1         0  models.common.Concat                    [1]                           
 43                -1  1    262656  models.common.Conv                      [1024, 256, 1, 1]             
 44                -2  1    262656  models.common.Conv                      [1024, 256, 1, 1]             
 45                -1  1    590336  models.common.Conv                      [256, 256, 3, 1]              
 46                -1  1    590336  models.common.Conv                      [256, 256, 3, 1]              
 47                -1  1    590336  models.common.Conv                      [256, 256, 3, 1]              
 48                -1  1    590336  models.common.Conv                      [256, 256, 3, 1]              
 49  [-1, -3, -5, -6]  1         0  models.common.Concat                    [1]                           
 50                -1  1   1050624  models.common.Conv                      [1024, 1024, 1, 1]            
 51                -1  1   7609344  models.common.SPPCSPC                   [1024, 512, 1]                
 52                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 53                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 54                37  1    262656  models.common.Conv                      [1024, 256, 1, 1]             
 55          [-1, -2]  1         0  models.common.Concat                    [1]                           
 56                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 57                -2  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 58                -1  1    295168  models.common.Conv                      [256, 128, 3, 1]              
 59                -1  1    147712  models.common.Conv                      [128, 128, 3, 1]              
 60                -1  1    147712  models.common.Conv                      [128, 128, 3, 1]              
 61                -1  1    147712  models.common.Conv                      [128, 128, 3, 1]              
 62[-1, -2, -3, -4, -5, -6]  1         0  models.common.Concat                    [1]                           
 63                -1  1    262656  models.common.Conv                      [1024, 256, 1, 1]             
 64                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 65                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 4, 'nearest']          
 66                11  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 67          [-1, -2]  1         0  models.common.Concat                    [1]                           
 68                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 69                -2  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 70                -1  1     73856  models.common.Conv                      [128, 64, 3, 1]               
 71                -1  1     36992  models.common.Conv                      [64, 64, 3, 1]                
 72                -1  1     36992  models.common.Conv                      [64, 64, 3, 1]                
 73                -1  1     36992  models.common.Conv                      [64, 64, 3, 1]                
 74[-1, -2, -3, -4, -5, -6]  1         0  models.common.Concat                    [1]                           
 75                -1  1     65792  models.common.Conv                      [512, 128, 1, 1]              
 76                -1  1         0  models.common.MP                        []                            
 77                -1  1     16640  models.common.Conv                      [128, 128, 1, 1]              
 78                -3  1     16640  models.common.Conv                      [128, 128, 1, 1]              
 79                -1  1    147712  models.common.Conv                      [128, 128, 3, 4]              
 80      [-1, -3, 63]  1         0  models.common.Concat                    [1]                           
 81                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 82                -2  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 83                -1  1    295168  models.common.Conv                      [256, 128, 3, 1]              
 84                -1  1    147712  models.common.Conv                      [128, 128, 3, 1]              
 85                -1  1    147712  models.common.Conv                      [128, 128, 3, 1]              
 86                -1  1    147712  models.common.Conv                      [128, 128, 3, 1]              
 87[-1, -2, -3, -4, -5, -6]  1         0  models.common.Concat                    [1]                           
 88                -1  1    262656  models.common.Conv                      [1024, 256, 1, 1]             
 89                -1  1         0  models.common.MP                        []                            
 90                -1  1     66048  models.common.Conv                      [256, 256, 1, 1]              
 91                -3  1     66048  models.common.Conv                      [256, 256, 1, 1]              
 92                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 93      [-1, -3, 51]  1         0  models.common.Concat                    [1]                           
 94                -1  1    525312  models.common.Conv                      [1024, 512, 1, 1]             
 95                -2  1    525312  models.common.Conv                      [1024, 512, 1, 1]             
 96                -1  1   1180160  models.common.Conv                      [512, 256, 3, 1]              
 97                -1  1    590336  models.common.Conv                      [256, 256, 3, 1]              
 98                -1  1    590336  models.common.Conv                      [256, 256, 3, 1]              
 99                -1  1    590336  models.common.Conv                      [256, 256, 3, 1]              
100[-1, -2, -3, -4, -5, -6]  1         0  models.common.Concat                    [1]                           
101                -1  1   1049600  models.common.Conv                      [2048, 512, 1, 1]             
102                75  1    328704  models.common.RepConv                   [128, 256, 3, 1]              
103                88  1   1312768  models.common.RepConv                   [256, 512, 3, 1]              
104               101  1   5246976  models.common.RepConv                   [512, 1024, 3, 1]             
105   [102, 103, 104]  1    460282  IDetect                                 [80, [[12, 16, 19, 36, 40, 28], [36, 75, 76, 55, 72, 146], [142, 110, 192, 243, 459, 401]], [256, 512, 1024]]
Traceback (most recent call last):
  File "/home/yao/project/detection/yolov7/models/yolo.py", line 827, in <module>
    model = Model(opt.cfg).to(device)
  File "/home/yao/project/detection/yolov7/models/yolo.py", line 544, in __init__
    m.stride = torch.tensor([s / x.shape[-2] for x in self.forward(torch.zeros(1, ch, s, s))])  # forward
  File "/home/yao/project/detection/yolov7/models/yolo.py", line 599, in forward
    return self.forward_once(x, profile)  # single-scale inference, train
  File "/home/yao/project/detection/yolov7/models/yolo.py", line 625, in forward_once
    x = m(x)  # run
  File "/home/yao/miniconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/yao/project/detection/yolov7/models/common.py", line 62, in forward
    return torch.cat(x, self.d)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 16 but got size 32 for tensor number 1 in the list.

loveq007 commented 2 years ago

Hi @loveq007,

I did the same for yolov7-tiny and I get better results:
   [-1, 1, nn.Upsample, [None, 4, 'nearest']], # Upsample 4 instead of 2
   [7, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 7 instead of 14
Instead of:

https://github.com/WongKinYiu/yolov7/blob/892b603718cda9f118368a07dab700b5309c2ac5/cfg/training/yolov7-tiny.yaml#L76

https://github.com/WongKinYiu/yolov7/blob/892b603718cda9f118368a07dab700b5309c2ac5/cfg/training/yolov7-tiny.yaml#L77

and
  [-1, 1, Conv, [128, 3, 4, None, 1, nn.LeakyReLU(0.1)]], # stride 2 to 4
instead of

https://github.com/WongKinYiu/yolov7/blob/892b603718cda9f118368a07dab700b5309c2ac5/cfg/training/yolov7-tiny.yaml#L87

Thank you! With this modification, I get better results too, but not better than yolov5s6.

AlexeyAB commented 2 years ago

I fixed and tested it. Try this cfg file: yolov7_so.zip

I used

   [-1, 1, nn.Upsample, [None, 4, 'nearest']],
   [11, 1, Conv, [128, 1, 1]], # route backbone P3

instead of https://github.com/WongKinYiu/yolov7/blob/c14ba0c297b3b5fc0374c917db798c88f9dd226c/cfg/training/yolov7.yaml#L92-L93

and [-1, 1, Conv, [128, 1, 2]], instead of https://github.com/WongKinYiu/yolov7/blob/c14ba0c297b3b5fc0374c917db798c88f9dd226c/cfg/training/yolov7.yaml#L106

and [-1, 1, Conv, [128, 3, 4]], instead of https://github.com/WongKinYiu/yolov7/blob/c14ba0c297b3b5fc0374c917db798c88f9dd226c/cfg/training/yolov7.yaml#L108

AlexeyAB commented 2 years ago

Thank you! With this modification, I get better results too, but not better than yolov5s6.

Or just try yolov7-w6

pyl62112991 commented 2 years ago

Hi @loveq007,

I did the same for yolov7-tiny and I get better results:
   [-1, 1, nn.Upsample, [None, 4, 'nearest']], # Upsample 4 instead of 2
   [7, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 7 instead of 14
Instead of:

https://github.com/WongKinYiu/yolov7/blob/892b603718cda9f118368a07dab700b5309c2ac5/cfg/training/yolov7-tiny.yaml#L76

https://github.com/WongKinYiu/yolov7/blob/892b603718cda9f118368a07dab700b5309c2ac5/cfg/training/yolov7-tiny.yaml#L77

and
  [-1, 1, Conv, [128, 3, 4, None, 1, nn.LeakyReLU(0.1)]], # stride 2 to 4
instead of

https://github.com/WongKinYiu/yolov7/blob/892b603718cda9f118368a07dab700b5309c2ac5/cfg/training/yolov7-tiny.yaml#L87

May I ask, why did you make such a modification, and is there any basis for it?

MyraBaba commented 2 years ago

@AlexeyAB

Hi I just need to detect vehicles and person . Video 4Mp when resized to yolov7 640 or 1280 small vehicles not detected.

What is the best way to detect smaller objects and keep the accuracy / speed ?

is "yolov7-w6" meant for this ?

there is a --img-size .... option for detect.py . Is it usefull to increase it to 2560 or so what is it affect ?

Best

hafidh561 commented 2 years ago

I fixed and tested it. Try this cfg file: yolov7_so.zip

I used
   [-1, 1, nn.Upsample, [None, 4, 'nearest']],
   [11, 1, Conv, [128, 1, 1]], # route backbone P3
instead of

https://github.com/WongKinYiu/yolov7/blob/c14ba0c297b3b5fc0374c917db798c88f9dd226c/cfg/training/yolov7.yaml#L92-L93

and [-1, 1, Conv, [128, 1, 2]], instead of

https://github.com/WongKinYiu/yolov7/blob/c14ba0c297b3b5fc0374c917db798c88f9dd226c/cfg/training/yolov7.yaml#L106

and [-1, 1, Conv, [128, 3, 4]], instead of

https://github.com/WongKinYiu/yolov7/blob/c14ba0c297b3b5fc0374c917db798c88f9dd226c/cfg/training/yolov7.yaml#L108

@AlexeyAB Hi Alexey, how do I implement this config for YOLOv7 Tiny?

MyraBaba commented 2 years ago

is there any model(s) trained for small objects ie:visdrone?

sathyanarayanmc commented 1 year ago

I fixed and tested it. Try this cfg file: yolov7_so.zip I used
   [-1, 1, nn.Upsample, [None, 4, 'nearest']],
   [11, 1, Conv, [128, 1, 1]], # route backbone P3
instead of https://github.com/WongKinYiu/yolov7/blob/c14ba0c297b3b5fc0374c917db798c88f9dd226c/cfg/training/yolov7.yaml#L92-L93

and [-1, 1, Conv, [128, 1, 2]], instead of https://github.com/WongKinYiu/yolov7/blob/c14ba0c297b3b5fc0374c917db798c88f9dd226c/cfg/training/yolov7.yaml#L106

and [-1, 1, Conv, [128, 3, 4]], instead of https://github.com/WongKinYiu/yolov7/blob/c14ba0c297b3b5fc0374c917db798c88f9dd226c/cfg/training/yolov7.yaml#L108
@AlexeyAB Hi Alexey, how do I implement this config for YOLOv7 Tiny?

i use a same method for head model training but still its not giving proper bouning box

WongKinYiu / yolov7

Small object under 16px detection #403