AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.65k stars 7.96k forks source link

Help/Clarification Needed - calculate anchors #7744

Closed holger-prause closed 3 years ago

holger-prause commented 3 years ago

Hello , i have a datasets consisting of many small objects and from what i read how the bounding box detection works, it would make sense to give an initial set of anchor boxes.

I know there is a documentation on how to do this here https://github.com/AlexeyAB/darknet#how-to-improve-object-detection To be honest - i find it pretty confusing as it uses a lot of terms which are not explained anywhere. It mentions "Only if you are an expert in neural detection networks". I consider myself to be one but still cant work with the documentation :-( Here's my current understanding and my questions - i would be very happy if someone could help/comment. To make it easier to comment i organize my questions in a bold section call ConfusionNr

- recalculate anchors for your dataset for width and height from cfg-file: darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416

-num_of_clusters - The amount of anchors - corresponds to the value of the num parameter in the yolo layer. An anchor consists of a pair of two values(width, height) - comma separated. -width - the network width -height - the network height

This would mean for tiny yolov4 tiny(not 3l), -num_of_clusters should be 6 - correct? Lets go on:

...then set the same 9 anchors in each of 3 [yolo]-layers in your cfg-file. But you should change indexes of anchors masks= for each [yolo]-layer, so for YOLOv4 the 1st-[yolo]-layer has anchors smaller than 30x30, 2nd smaller than 60x60, 3rd remaining, and vice versa for YOLOv3.

Lets make an example to understand this: So lets say i end up with anchors like this (only example!) and i have 3 yolo layers. 29,29, 59,59, 89,89

1st yolo layer < (30*30) would fit 29,29 pair mask=0

2nd yolo layer < (60x60) would fit 59,59 AND 29,29 mask=0,1

2nd yolo layer >= (60*60) would fit 89,89 pair mask=2

Confusion Nr1 - Masks indices value ranges . Lets take a look the predefined configs: yolov4 full https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov4.cfg

1st yolo layer mask = 0,1,2 anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 This selects
12,16, 19, 36, 40, 28 192, 684, 1120

So you see on the first layer the selected pair 40, 28 which is somewhat bigger than the stated 30*30. So when selecting values the just must be somehow close to the recommended value? I think that is the case but i am not sure and the documentation says smaller.

Confusion Nr2 - Masks indices selection . yolov4 full https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov4.cfg 1st yolo layer mask = 0,1,2 anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 This selects
12,16, 19, 36, 40, 28 192, 684, 1120

2nd yolo layer mask = 3,4,5 anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 This selects
36,75, 76,55, 72,146

The 2nd layer does not select 0,1,2 but they are clear smaller than 60*60. "2nd smaller than 60x60".

So using my basic human pattern matching skill this is how it works?! These are not called clusters for no reasons. A point usually does not belong to two cluster at the same time (at least this is how kmeans work - which is used by yolo for anchor calculation i believe)

All boxes around 30x30 to layer1, all boxes around 60x60 layer2, rest layer3. And select the correct anchor box indices. Is this understanding correct?

Confusion Nr3 - Different Masks indices selection methods . Ok who read until here - thank you a lot! While Everything above can be understood and fixed by using common sense - here i something very nasty:

yolov4-tiny 1st yolo layer mask = 3,4,5 anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319 This selects 81,82, 135,169, 344,319 6642, 22815, 109736

yolov4-full 1st yolo layer mask = 0,1,2 anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 This selects
12,16, 19, 36, 40, 28 192, 684, 1120

So you see for the first layer on tiny yolo the biggest anchors are selected but on yolov4 full the smallest ones. Is this intended? Both are just configs describing a neural network or is there some "is tiny" flag in the source code with some special logic to it. To me this seems inconsistent and i just don't know anymore what is right or not - please help.

Thank you very much, i tried to describe as best as i can - sorry for the long text. Greetings, Holger

holger-prause commented 3 years ago

In the end the difference between tiny and full considering the anchors (big anchors first vs small anchors first) propably does not matter - closing ticket.

holger-prause commented 3 years ago

Ok after doing some testing:

The anchor selection order (small anchors first - yolov4, big anchors first in yolov4-tiny) DOES matter. So just keep the existing pattern intact.

And calculating custom anchor DID improve the map for my custom dataset.

ElHouas commented 2 years ago

@holger-prause Thanks, this issue has been really useful to get more understanding about the the anchor boxes. I'd like to ask you about the num_clusters when calculating the anchor boxes. From what I read, the anchor boxes should cover almost all the data in the point cloud, so if I keep increasing the num of clusters, for example from 9 to 30 I ll get a higher avg IOU, but I m not sure this is a wise decision when training. Did you still use 9 as num of clusters?

Thanks for the help!