Calculated anchors work worse on custom small-object dataset

lizyn commented 5 years ago

problem

My dataset consists of images with consistent size of 1024x1024 and the objects are small — mainly between 5-30, so I cropped the original images to 416x416 ones with overlaps. There is only one class of objects.

With default anchors, I achieve mAP@0.5 around 99% (well, because this is relatively an easy task). But with calculated anchors, I get mAP only about 98%. What could be the reasons?

There is an issue of a similar problem #1200, but the given reason is not that relevant to mine situation.

steps to reproduce

I used the following command to calculate anchors:

./darknet detector calc_anchors data/obj_1/obj.data -num_of_clusters 9 -width 416 -height 416 -show

the output is

 num_of_clusters = 9, width = 416, height = 416
 read labels from 1958 images
 loaded      image: 1958     box: 3232
 all loaded.

 calculating k-means++ ...

 iterations = 7

 avg IoU = 82.56 %

Saving anchors to the file: anchors.txt
anchors =   7,  7,   9,  8,  16,  7,   9, 17,  24,  7,  14, 13,  22,  9,  20, 13,  18, 17
^C

the generated cloud.png:

cloud

And here is the config file. I copied it from yolov3.cfg and changed nearly nothing except the filters and anchors, as well as the "stride stuffs" as said in #how-to-improve-object-detection for small objects.

➜ diff cfg/yolov3.cfg cfg/obj.cfg
# omitting the [net] parts
...
603c603
< filters=255
---
> filters=18
609,610c609,610
< anchors = 10,13,  16,30,  33,23,  30,61,  62,45,  59,119,  116,90,  156,198,  373,326
< classes=80
---
> anchors = 7,  7,  14,  7,   8, 17,  23,  7,  15, 12,  11, 17,  22,  9,  21, 12,  18, 16
> classes=1
...  # omitting repetitions
717c717   
    # I also did the following changes as said in  
    # #how-to-improve-object-detection for small objects, 
    # and they give me a minor boost in mAP at about 0.2%,
    # partly because this's an easy task? Or objects not small enough?
< stride=2
---
> stride=4
720c720
< layers = -1, 36
---
> layers = -1, 11
...

Sorry to bother you, but I really need your help, @AlexeyAB.

You may wonder why I'm so obsessed with the 1% mAP difference. That's because I'm writing a project report and it's important that I have better understanding of it.

AlexeyAB commented 5 years ago

@lizyn Hi,

The reason is because

1st yolo layer should be used for anchors (and objects) with sizes > 60x60
2nd yolo layer should be used for anchors (and objects) with: 60x60 > sizes > 30x30
3rd yolo layer should be used for anchors (and objects) with sizes < 30x30

So in your case better to use default anchors or these anchors, masks and filters:

filters = (classes+5)*1

[yolo]
mask=10
anchors = 7,  7,  14,  7,   8, 17,  23,  7,  15, 12,  11, 17,  22,  9,  21, 12,  18, 16,   30,30,  60,60
num=11
...

filters = (classes+5)*1

[yolo]
mask=9
anchors = 7,  7,  14,  7,   8, 17,  23,  7,  15, 12,  11, 17,  22,  9,  21, 12,  18, 16,   30,30,  60,60
num=11
...

filters = (classes+5)*9

[yolo]
mask=0,1,2,3,4,5,6,7,8
anchors = 7,  7,  14,  7,   8, 17,  23,  7,  15, 12,  11, 17,  22,  9,  21, 12,  18, 16,   30,30,  60,60
num=11

https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

recalculate anchors for your dataset for width and height from cfg-file: darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416 then set the same 9 anchors in each of 3 [yolo]-layers in your cfg-file. But you should change indexes of anchors masks= for each [yolo]-layer, so that 1st-[yolo]-layer has anchors larger than 60x60, 2nd larger than 30x30, 3rd remaining. Also you should change the filters=(classes + 5)* before each [yolo]-layer. If many of the calculated anchors do not fit under the appropriate layers - then just try using all the default anchors.

>   # I also did the following changes as said in  
>   # #how-to-improve-object-detection for small objects, 
>   # and they give me a minor boost in mAP at about 0.2%,
>   # partly because this's an easy task? Or objects not small enough?
> < stride=2
> ---
> > stride=4
> 720c720
> < layers = -1, 36
> ---
> > layers = -1, 11

I think your objects are small enought, but the mAP is the closer to 99%, the more difficult to improve it.

Also you can try to use yolov3-5l.cfg cfg-file

lizyn commented 5 years ago

Sorry that didn't read the README carefully enough!

I'll try your suggested anchors. But may I ask why the rule is like this (>60, 30-60...)? I mean, if I wanna explain this rule in a report, how do I give the reasons? A brief guide would be really helpful, thanks!

AlexeyAB commented 5 years ago

Just tested experimentally. This is related to the following calculations.

Just count subsampling layers (maxpool or convolutional layers with stride=2) 1st yolo layer has 5 subsampling layers, 2^5 = 32, so size of objects should be >64x64 (why not 32x32? I don't know ;) 2nd yolo layer has 4 subsampling layers, 2^4 = 16, so size of objects should be >32x32 3rd yolo layer has 3 subsampling layers, 2^3 = 8, so size of objects should be >16x16

With these changes

< stride=2
---
> stride=4
720c720
< layers = -1, 36
---
> layers = -1, 11

3rd yolo layer has 2 subsampling layers, 2^2 = 4, so size of objects should be >8x8

AlexeyAB commented 5 years ago

@lizyn After training with new anchors - show the Loss & mAP chart.

lizyn commented 5 years ago

default anchors

obj

calculated anchors

anchor

it gets killed by colab, and I started it again later, and settles at 98% in the last iters.

calculated and reassigned anchors, i.e. your suggested ones

new-anchor

lizyn commented 5 years ago

@AlexeyAB, I have posted the charts training with new anchors.

AlexeyAB commented 5 years ago

@lizyn It seems that calculated and reassigned anchors gives the best mAP (slightly better than default anchors), so just use it.

lizyn commented 5 years ago

Yes, I think so. Thanks for all your help! May I mention you in the acknowledgements of my report? Well, I'm doing it as long as it doesn't bother you.

AlexeyAB commented 5 years ago

@lizyn Yes, you can. In your way or something like this: https://github.com/AlexeyAB/darknet/issues/2782#issuecomment-478615007

bogdan42k commented 5 years ago

After I calculated anchors for -num_of_clusters 9 -width 1024 -height 1024, I have anchors = 29,19, 57,21, 55,33, 85,60, 201,28, 353,43, 259,86, 191,359, 270,547

In yolov3-spp.cfg I modified the "stride stuffs" as said in #how-to-improve-object-detection for small objects (stride=4, layers = -1, 11) Do I also have to change mask= and num= lines to fit <30, 30-60 and >60?

AlexeyAB commented 5 years ago

Do I also have to change mask= and num= lines to fit <30, 30-60 and >60?

Yes.

NickiBD commented 5 years ago

Hi @AlexeyAB,

In the beginning ,I was just using recalculations of the anchors for yolov3 for small object detection: 4, 12, 5, 23, 7, 33, 11, 22, 10, 50, 19, 34, 14, 61, 26, 65, 50, 69 then I realized I need to change the masks indices based on the guidance : "1st yolo layer should be used for anchors (and objects) with sizes > 60x60

2nd yolo layer should be used for anchors (and objects) with: 60x60 > sizes > 30x30

3rd yolo layer should be used for anchors (and objects) with sizes < 30x30"

The following is the change :

filters =6 (I have one class)

[yolo] mask=9 anchors = 4, 12, 5, 23, 7, 33, 11, 22, 10, 50, 19, 34, 14, 61, 26, 65, 50, 69, 60,60 num=10 ...

filters = 6

[yolo] mask=8 anchors = 4, 12, 5, 23, 7, 33, 11, 22, 10, 50, 19, 34, 14, 61, 26, 65, 50, 69, 60,60 num=10 ...

filters = 48

[yolo] mask=0,1,2,3,4,5,6,7 anchors = 4, 12, 5, 23, 7, 33, 11, 22, 10, 50, 19, 34, 14, 61, 26, 65, 50, 69, 60,60 num=10

However, after training, I got same or worse results than just using 4, 12, 5, 23, 7, 33, 11, 22, 10, 50, 19, 34, 14, 61, 26, 65, 50, 69

I was wondering whether I have done it right.

Secondly ,if I want to use the guidance for tiny-yolo v3 with 6 anchors .Does the guidance apply . I recalculated the anchors 6, 21, 9, 40, 16, 28, 14, 59, 34, 56, 48, 105 then I changed it as follows :

[yolo] mask=6 (I have one class ) anchors = 6, 21, 9, 40, 16, 28, 14, 59, 34, 56, 48, 105, 60,60 num=7 ...

filters = 36

[yolo] mask=0,1,2,3,4,5 anchors = 6, 21, 9, 40, 16, 28, 14, 59, 34, 56, 48, 105, 60,60 num=7 ... Is this a right approach or should it be different

I would be really grateful if you could assist me with this .

Many thanks in advance .

NickiBD commented 5 years ago

Hi again @AlexeyAB Sorry I also forgot to ask another thing
Can we adjust mask indices as below

Mask=0,1,2,3,5 for 3rd yolo layer less than 30x30

If we have a anchor recalculation of 8, 27, 12, 52, 22, 43, 17, 79, 38, 66, 25,112, 36,129, 61,133, 109,130

I truly appreciate all your guidance. Many thanks.

AlexeyAB commented 5 years ago

8, 27, 12, 52, 22, 43, 17, 79, 38, 66, 25,112, 36,129, 61,133, 109,130

There is no one correct answer, just try both cases and get with higher mAP.

NickiBD commented 5 years ago

@AlexeyAB Thanks you for your response and sorry for bothering you again .I wanted to know whether my following anchor recalculations are correct for yolov3 based on your guidance as the detection didnot improve and how anchor recalculation works for tiny yolov3 (are my following recalculations correct for tiny yolov3?) Thanks again .

"In the beginning ,I was just using recalculations of the anchors for yolov3 for small object detection: 4, 12, 5, 23, 7, 33, 11, 22, 10, 50, 19, 34, 14, 61, 26, 65, 50, 69 then I realized I need to change the masks indices based on the guidance : "1st yolo layer should be used for anchors (and objects) with sizes > 60x60

2nd yolo layer should be used for anchors (and objects) with: 60x60 > sizes > 30x30

3rd yolo layer should be used for anchors (and objects) with sizes < 30x30"

The following is the change :

filters =6 (I have one class)

[yolo] mask=9 anchors = 4, 12, 5, 23, 7, 33, 11, 22, 10, 50, 19, 34, 14, 61, 26, 65, 50, 69, 60,60 num=10 ...

filters = 6

[yolo] mask=8 anchors = 4, 12, 5, 23, 7, 33, 11, 22, 10, 50, 19, 34, 14, 61, 26, 65, 50, 69, 60,60 num=10 ...

filters = 48

[yolo] mask=0,1,2,3,4,5,6,7 anchors = 4, 12, 5, 23, 7, 33, 11, 22, 10, 50, 19, 34, 14, 61, 26, 65, 50, 69, 60,60 num=10

However, after training, I got same or worse results than just using 4, 12, 5, 23, 7, 33, 11, 22, 10, 50, 19, 34, 14, 61, 26, 65, 50, 69

I was wondering whether I have done it right.

Secondly ,if I want to use the guidance for tiny-yolo v3 with 6 anchors .Does the guidance apply . I recalculated the anchors 6, 21, 9, 40, 16, 28, 14, 59, 34, 56, 48, 105 then I changed it as follows :

[yolo] mask=6 (I have one class ) anchors = 6, 21, 9, 40, 16, 28, 14, 59, 34, 56, 48, 105, 60,60 num=7 ...

filters = 36

[yolo] mask=0,1,2,3,4,5 anchors = 6, 21, 9, 40, 16, 28, 14, 59, 34, 56, 48, 105, 60,60 num=7 ..."

AlexeyAB commented 5 years ago

Try to use [yolo] mask=5,6 (I have one class ) anchors = 6, 21, 9, 40, 16, 28, 14, 59, 34, 56, 48, 105, 60,60 num=7 ...

[yolo] mask=0,1,2,3,4 anchors = 6, 21, 9, 40, 16, 28, 14, 59, 34, 56, 48, 105, 60,60 num=7

NickiBD commented 5 years ago

Thank you so much .I will try that .

AlexeyAB commented 5 years ago

@ggolkar Also try to train this yolov3-tiny-pan-3l cfg-file with default anchors: yolov3-tiny-pan.cfg.txt

This cfg-file is much less demanding on the correct location of the masks for anchors.

And show Loss & mAP chart.

You should use exactly this repo https://github.com/AlexeyAB/darknet since there is fixed [reorg] layer that is used in this cfg-file.

NickiBD commented 5 years ago

Thanks alot .I truly appreciate your help.

pakdigymon commented 5 years ago

Just tested experimentally. This is related to the following calculations.

Just count subsampling layers (maxpool or convolutional layers with stride=2) 1st yolo layer has 5 subsampling layers, 2^5 = 32, so size of objects should be >64x64 (why not 32x32? I don't know ;) 2nd yolo layer has 4 subsampling layers, 2^4 = 16, so size of objects should be >32x32 3rd yolo layer has 3 subsampling layers, 2^3 = 8, so size of objects should be >16x16

With these changes
< stride=2
---
> stride=4
720c720
< layers = -1, 36
---
> layers = -1, 11
3rd yolo layer has 2 subsampling layers, 2^2 = 4, so size of objects should be >8x8

Hello, Could you explain this more detail..? I don't understand the subsampling yolo layers. Do you mean that the subsampling is undersampling or upsampling?

nyj-ocean commented 4 years ago

@AlexeyAB As shown in https://github.com/AlexeyAB/darknet/issues/2960#issuecomment-484595788

Just count subsampling layers (maxpool or convolutional layers with stride=2) 1st yolo layer has 5 subsampling layers, 2^5 = 32, so size of objects should be >64x64 (why not 32x32? I don't know ;) 2nd yolo layer has 4 subsampling layers, 2^4 = 16, so size of objects should be >32x32 3rd yolo layer has 3 subsampling layers, 2^3 = 8, so size of objects should be >16x16

This is in the condition with 3 [yolo] layers like yolov3.cfg

But if in the condition with 4 [yolo] layers like yolov3_5l.cfg? Should it be correct like following?

1st yolo layer >64x64 2nd yolo layer >32x32 and <64x64 3rd yolo layer >16x16 and <32x32 4ur yolo layer >8x8 and <16x16 5iv yolo layer >4x4 and <8x8

AlexeyAB commented 4 years ago

@nyj-ocean Yes. If you use width height ~416x416 - ~608x608 in cfg.

AlexeyAB / darknet

Calculated anchors work worse on custom small-object dataset #2960

problem

steps to reproduce

default anchors

calculated anchors

calculated and reassigned anchors, i.e. your suggested ones