AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.65k stars 7.96k forks source link

How to use gen_anchors.py #418

Open sayanmutd opened 6 years ago

sayanmutd commented 6 years ago

@AlexeyAB How can i use gen_anchors.py and how to decide upon number of cluster on my custom dataset. My custom dataset contains 10 distinct objects which are non-overlapping in nature.

AlexeyAB commented 6 years ago

@sayanmutd

MyVanitar commented 6 years ago

@AlexeyAB

Have you tested the effect of Anchors? I mean you keep everything identical and just change the anchors by the generated ones.

sivagnanamn commented 6 years ago

@VanitarNordic Instead of using default VOC anchors, generating custom anchors for our own dataset does help to get better mAP. I've seen a considerable 4-5% increase. Again it depends on the dataset and types of objects in them.

MyVanitar commented 6 years ago

@sivagnanamn

You mean sometimes it increases the performance and sometimes doesn't? In my case it did not. it decreased the mAP.

sivagnanamn commented 6 years ago

@VanitarNordic Yup, its not always guaranteed. But intuitively it makes sense to generate custom anchors for custom datasets instead of using defaul VOC anchors. In my case, my objects were lot smaller than the VOC anchors. You can visualize the anchors in an image for getting a better idea whether the default anchors fit your object better.

I used the script below to check:

https://github.com/Jumabek/darknet_scripts/blob/master/visualize_anchors.py

I'm not sure whether @AlexeyAB has added anchor visualization script to his repo.

Edit 1: You can even try to increase the number of anchors from 5 to 7 or 9 clusters and check performance. It may better than default VOC anchors.

MyVanitar commented 6 years ago

@sivagnanamn

Actually I had generated the new anchors with this script either which led to a lower performance. In theory, Yes, we must calculate anchors for each dataset but I could not find the best script to do so.

sivagnanamn commented 6 years ago

@VanitarNordic Just to check the effect of custom anchors, I used the script below long back. You can give a try if you find time. Haven't checked the difference between this script & @AlexeyAB's. May be if you try, please share the generated anchors from both the scripts, ideally both should produce very similar anchors.

https://github.com/Jumabek/darknet_scripts/blob/master/gen_anchors.py

MyVanitar commented 6 years ago

@sivagnanamn

Yes, I meant I used your introduced script to generate the new anchors. I think the @AlexeyAB one is the same.

AlexeyAB commented 6 years ago

@sayanmutd @VanitarNordic @sivagnanamn

I just copied this script: https://github.com/AlexeyAB/darknet/blob/master/scripts/gen_anchors.py from: https://github.com/Jumabek/darknet_scripts/blob/master/gen_anchors.py so shese scriptes are the same.

  1. gen_anchors.py gives a little bit difference anchors as kmeansiou.c that is used by Joseph: https://github.com/AlexeyAB/darknet/blob/master/scripts/kmeansiou.c
  2. May be k-means is not the best ways to get anchors, especially to get a good mAP: https://en.wikipedia.org/wiki/K-means_clustering

Two main problems of K-means:

  1. K-means is not guaranteed to achieve a global minimum of the total quadratic deviation of V, but only one of the local minimum.
  2. The result depends on the choice of the initial centers of the clusters, their optimal choice is unknown.

And as you can see local minimum isn't optimal result - right image: Where Red-center includes two clusters, when Brown-center and Violet-center share one cluster.

k-means_convergence_to_a_local_minimum


For example SSD uses the same anchors as in the Faster RCNN : https://arxiv.org/pdf/1512.02325.pdf

Our default boxes are similar to the anchor boxes used in Faster R-CNN

In the SSD and Faster RCNN are used just hand-made anchors instead of k-means - total 9 anchors: for 3 aspect ratios 1:1, 2:1, 1:2 and for 3 different resolutions 128x128, 256x256, 512x512: https://medium.com/@smallfishbigsea/faster-r-cnn-explained-864d4fb7e3f8

  1. Three colors represent three scales or sizes: 128x128, 256x256, 512x512.
  2. Let’s single out the red boxes/anchors. The three boxes have height width ratios 1:1, 1:2 and 2:1 respectively.

1_is_9hnkfddf00nid6xxf_a

MyVanitar commented 6 years ago

@AlexeyAB

In the process of generating anchors, the relative width and height (Darkent format) are used. Do you think that might be local width and height instead of relative ones? (which can be retrieved from Pascal xmls)

AlexeyAB commented 6 years ago

Where it might be?

I think that in general k-means isn't the best approach. At least there are k-means++, Kohonen map or network, ...

MyVanitar commented 6 years ago

@AlexeyAB

Before we convert the VOC annotations to the Darknet Yolo, the VOC annotations are in xml format, Xmin, Ymin, Xmax, Ymax. Then width and height of each bounding box can be calculated but in this script, the Darknet relative width and height have used.

The K-mean has mentioned in the Paper. Do you think they did not want to disclose it? Besides Yolo-V1 and Yolo-V2 anchors are different for the VOC.

AlexeyAB commented 6 years ago

@VanitarNordic Who are they and in which article? I really do not understand the question.

gen_anchors.py calculates anchors that are relative to width & height. kmeansiou.c calculates anchors that aren't relative to width & height, but Joseph multiplied them by the desired ratio.

Yolo v1 uses 2 hand-made anchors, when SSD and Faster RCNN use 9 hand-made anchors.

MyVanitar commented 6 years ago

@AlexeyAB

I mean K-mean is written in paper, without extra information that you mentioned, I mean Kohnen map and ....

Relative width and height is my assumption because I see relative width and height of the bounding boxes are used in calculations in the gen_anchors.py, which might not be correct.

Joseph multiplied them by the desired ratio

Did he multiply them with 13?

AlexeyAB commented 6 years ago

Joseph multiplied them by the desired ratio

Did he multiply them with 13?

Yes

AlexeyAB commented 6 years ago

I just added calc_anchors() function to the Darknet as an experimental, that uses k-means++ that a little bit better than k-means: https://en.wikipedia.org/wiki/K-means%2B%2B Used OpenCV implementation: https://docs.opencv.org/2.4/modules/core/doc/clustering.html#kmeans Usage:

darknet.exe detector calc_anchors data/voc.data -num_of_clusters 5 -final_width 13 -final_heigh 13

I got these anchors for Pascal VOC dataset 2007_train.txt 2007_val.txt 2012_train.txt 2012_val.txt:

D:\Darknet2\darknet\build\darknet\x64>darknet.exe detector calc_anchors data/voc
.data -num_of_clusters 5 -final_width 13 -final_heigh 13

 num_of_clusters = 5, final_width = 13, final_height = 13
 read labels from 16551 images
 loaded          image: 16551    box: 40058
 all loaded.

 calculating k-means++ ...
anchors = 9.38,6.01, 3.40,5.35, 10.95,11.20, 5.02,9.83, 1.50,2.16,
D:\Darknet2\darknet\build\darknet\x64>pause

One of the reasons for the difference in the results, because in the original version, this type of distance was used (distance(box, centroid) = 1 - IoU(box, centroid)) for better IoU instead of the usual Euclidean distance, Page 3: https://arxiv.org/pdf/1612.08242v1.pdf

MyVanitar commented 6 years ago

@AlexeyAB

A very good practice Alexey. Congratulations.

may I ask you to handle your .txt file which contains all VOC-2007 and 2012 annotations in one file? I think you concatenated them in a single txt file. I want to do some tests.

AlexeyAB commented 6 years ago

@VanitarNordic No, I made 1 txt-file which contains paths to all images, but annotations for each image in separate txt-files. Just do 2-5 points to get the same: https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data

And then do darknet.exe detector calc_anchors data/voc.data -num_of_clusters 5 -final_width 13 -final_heigh 13

MyVanitar commented 6 years ago

@AlexeyAB

Did you see anywhere that the width and height of the bounding boxes must be relative or absolute?

AlexeyAB commented 6 years ago

@VanitarNordic No.


https://arxiv.org/pdf/1612.08242v1.pdf image

image

MyVanitar commented 6 years ago

@AlexeyAB

Thank you. A really good job man. The IoU must be used to calculated the distance for the KMeans as you did, but all people reached to 60% and not 61% and nobody know how this 1% can be added.

IoU = 66.37% for 9 anchors

Can we calculate for 9 anchors and start training? I think some other parameters must be changed also.

AlexeyAB commented 6 years ago

and train.


I didn't use IoU for distance, because for using distance = 1 - IoU(box, centroid) I should implement k-means with my own hands, in this case I can not use the implementation from the OpenCV. If there is time, I will do it.

MyVanitar commented 6 years ago

change num=9:

if we change the num, then filters should be changed, isn't it? because filters = (classes + coords + 1)*num

AlexeyAB commented 6 years ago

@VanitarNordic Yes, of course, I corrected the answer above.

MyVanitar commented 6 years ago

@AlexeyAB

Thank you. what about the custom dataset anchors calculation? I think it should be no difference with the original VOC, my only concern is background-only images with empty txt file.

AlexeyAB commented 6 years ago

@VanitarNordic There is not problem with background-only images with empty txt file. I test it on the my dataset where half of images havn't objects, and it works well. Images without object just is skipped.

MyVanitar commented 6 years ago

@AlexeyAB

I trained with 9 anchors. mAP remained intact but average IoU decreased about 2%. Still I don't know why original anchors work better because new ones create much better clustering (with whatever method).

below is the anchors:

2018-03-05_12-36-22

MyVanitar commented 6 years ago

beside, offline video processing using this command causes the darknet to crash,

darknet.exe detector demo data/obj.data yolo-obj.cfg backup/yolo-obj_final.weights test.mp4 -out_filename res.avi

it happens with the latest commit. The reason could be because of the new anchors or a different bug. I have an older commit which I download on 2018-02-24 which uses the original anchors. The video processing works there but The FPS is wrong and just swaps between two numbers, 32Fps or 64Fps. There are some weird character there also.

2018-03-05_12-54-02

AlexeyAB commented 6 years ago

@VanitarNordic Thank you, I fixed it.


I trained with 9 anchors. mAP remained intact but average IoU decreased about 2%. Still I don't know why original anchors work better because new ones create much better clustering (with whatever method).

I think there should be more cleaner and representative tests, with a 20-80 number of classes with 2000 images per class, using the same 100% identical: source code, dataset and identical model (except different anchors)

MyVanitar commented 6 years ago

@AlexeyAB

increasing the number of anchors will decrease the detection speed? if Yes, can you predict how much?

AlexeyAB commented 6 years ago

@VanitarNordic I think less than 1%.