Open sayanmutd opened 6 years ago
@sayanmutd
Download and install: https://www.python.org/downloads/release/python-2714/
run C:\Python27\Scripts\pip install numpy
run C:\Python27\python.exe gen_anchors.py -filelist data/train.txt -output_dir data/anchors -num_clusters 5
put anchors that you got here: https://github.com/AlexeyAB/darknet/blob/e16c9011631c7762d21baa02a34ad8db73dbdd3f/cfg/yolo-voc.2.0.cfg#L228 there is number of anchors (number of pair of values): https://github.com/AlexeyAB/darknet/blob/e16c9011631c7762d21baa02a34ad8db73dbdd3f/cfg/yolo-voc.2.0.cfg#L232
@AlexeyAB
Have you tested the effect of Anchors? I mean you keep everything identical and just change the anchors by the generated ones.
@VanitarNordic Instead of using default VOC anchors, generating custom anchors for our own dataset does help to get better mAP. I've seen a considerable 4-5% increase. Again it depends on the dataset and types of objects in them.
@sivagnanamn
You mean sometimes it increases the performance and sometimes doesn't? In my case it did not. it decreased the mAP.
@VanitarNordic Yup, its not always guaranteed. But intuitively it makes sense to generate custom anchors for custom datasets instead of using defaul VOC anchors. In my case, my objects were lot smaller than the VOC anchors. You can visualize the anchors in an image for getting a better idea whether the default anchors fit your object better.
I used the script below to check:
https://github.com/Jumabek/darknet_scripts/blob/master/visualize_anchors.py
I'm not sure whether @AlexeyAB has added anchor visualization script to his repo.
Edit 1: You can even try to increase the number of anchors from 5 to 7 or 9 clusters and check performance. It may better than default VOC anchors.
@sivagnanamn
Actually I had generated the new anchors with this script either which led to a lower performance. In theory, Yes, we must calculate anchors for each dataset but I could not find the best script to do so.
@VanitarNordic Just to check the effect of custom anchors, I used the script below long back. You can give a try if you find time. Haven't checked the difference between this script & @AlexeyAB's. May be if you try, please share the generated anchors from both the scripts, ideally both should produce very similar anchors.
https://github.com/Jumabek/darknet_scripts/blob/master/gen_anchors.py
@sivagnanamn
Yes, I meant I used your introduced script to generate the new anchors. I think the @AlexeyAB one is the same.
@sayanmutd @VanitarNordic @sivagnanamn
I just copied this script: https://github.com/AlexeyAB/darknet/blob/master/scripts/gen_anchors.py from: https://github.com/Jumabek/darknet_scripts/blob/master/gen_anchors.py so shese scriptes are the same.
gen_anchors.py
gives a little bit difference anchors as kmeansiou.c
that is used by Joseph: https://github.com/AlexeyAB/darknet/blob/master/scripts/kmeansiou.cTwo main problems of K-means:
And as you can see local minimum isn't optimal result - right image: Where Red-center includes two clusters, when Brown-center and Violet-center share one cluster.
For example SSD uses the same anchors as in the Faster RCNN : https://arxiv.org/pdf/1512.02325.pdf
Our default boxes are similar to the anchor boxes used in Faster R-CNN
In the SSD and Faster RCNN are used just hand-made anchors instead of k-means - total 9 anchors: for 3 aspect ratios 1:1, 2:1, 1:2 and for 3 different resolutions 128x128, 256x256, 512x512: https://medium.com/@smallfishbigsea/faster-r-cnn-explained-864d4fb7e3f8
- Three colors represent three scales or sizes: 128x128, 256x256, 512x512.
- Let’s single out the red boxes/anchors. The three boxes have height width ratios 1:1, 1:2 and 2:1 respectively.
@AlexeyAB
In the process of generating anchors, the relative width and height (Darkent format) are used. Do you think that might be local width and height instead of relative ones? (which can be retrieved from Pascal xmls)
Where it might be?
I think that in general k-means isn't the best approach. At least there are k-means++, Kohonen map or network, ...
@AlexeyAB
Before we convert the VOC annotations to the Darknet Yolo, the VOC annotations are in xml
format, Xmin, Ymin, Xmax, Ymax
. Then width and height of each bounding box can be calculated but in this script, the Darknet relative width and height have used.
The K-mean has mentioned in the Paper. Do you think they did not want to disclose it? Besides Yolo-V1 and Yolo-V2 anchors are different for the VOC.
@VanitarNordic Who are they and in which article? I really do not understand the question.
gen_anchors.py
calculates anchors that are relative to width & height. kmeansiou.c
calculates anchors that aren't relative to width & height, but Joseph multiplied them by the desired ratio.
Yolo v1 uses 2 hand-made anchors, when SSD and Faster RCNN use 9 hand-made anchors.
@AlexeyAB
I mean K-mean
is written in paper, without extra information that you mentioned, I mean Kohnen map and ....
Relative width and height is my assumption because I see relative width and height of the bounding boxes are used in calculations in the gen_anchors.py
, which might not be correct.
Joseph multiplied them by the desired ratio
Did he multiply them with 13?
Joseph multiplied them by the desired ratio
Did he multiply them with 13?
Yes
I just added calc_anchors()
function to the Darknet as an experimental, that uses k-means++ that a little bit better than k-means: https://en.wikipedia.org/wiki/K-means%2B%2B
Used OpenCV implementation: https://docs.opencv.org/2.4/modules/core/doc/clustering.html#kmeans
Usage:
darknet.exe detector calc_anchors data/voc.data -num_of_clusters 5 -final_width 13 -final_heigh 13
I got these anchors for Pascal VOC dataset 2007_train.txt 2007_val.txt 2012_train.txt 2012_val.txt
:
D:\Darknet2\darknet\build\darknet\x64>darknet.exe detector calc_anchors data/voc
.data -num_of_clusters 5 -final_width 13 -final_heigh 13
num_of_clusters = 5, final_width = 13, final_height = 13
read labels from 16551 images
loaded image: 16551 box: 40058
all loaded.
calculating k-means++ ...
anchors = 9.38,6.01, 3.40,5.35, 10.95,11.20, 5.02,9.83, 1.50,2.16,
D:\Darknet2\darknet\build\darknet\x64>pause
Sorted anchors: anchors = 1.50,2.16, 3.40,5.35, 5.02,9.83, 9.38,6.01, 10.95,11.20
In the yolo-voc.cfg: anchors = 1.32, 1.73, 3.19, 4.00, 5.05, 8.09, 9.47, 4.84, 11.23, 10.00
One of the reasons for the difference in the results, because in the original version, this type of distance was used (distance(box, centroid) = 1 - IoU(box, centroid)
) for better IoU instead of the usual Euclidean distance, Page 3: https://arxiv.org/pdf/1612.08242v1.pdf
@AlexeyAB
A very good practice Alexey. Congratulations.
may I ask you to handle your .txt
file which contains all VOC-2007 and 2012 annotations in one file? I think you concatenated them in a single txt
file. I want to do some tests.
@VanitarNordic No, I made 1 txt-file which contains paths to all images, but annotations for each image in separate txt-files. Just do 2-5 points to get the same: https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data
And then do darknet.exe detector calc_anchors data/voc.data -num_of_clusters 5 -final_width 13 -final_heigh 13
@AlexeyAB
Did you see anywhere that the width and height of the bounding boxes must be relative or absolute?
@VanitarNordic No.
anchors.txt
distance = 1 - IoU(box, centroid)
we can get 61.0%)https://arxiv.org/pdf/1612.08242v1.pdf
darknet.exe detector calc_anchors data/voc.data -num_of_clusters 5 -final_width 13 -final_heigh 13 -show
@AlexeyAB
Thank you. A really good job man. The IoU must be used to calculated the distance for the KMeans as you did, but all people reached to 60% and not 61% and nobody know how this 1% can be added.
IoU = 66.37% for 9 anchors
Can we calculate for 9 anchors and start training? I think some other parameters must be changed also.
Just set new anchors: https://github.com/AlexeyAB/darknet/blob/48586c8d4db5c00d3d4b9dabcc9a5d2294c5b15d/cfg/yolo-voc.cfg#L242
change num=9: https://github.com/AlexeyAB/darknet/blob/48586c8d4db5c00d3d4b9dabcc9a5d2294c5b15d/cfg/yolo-voc.cfg#L246
then change filters = (classes + coords + 1)*num = (classes+5)*9
in the last conv-layer https://github.com/AlexeyAB/darknet/blob/48586c8d4db5c00d3d4b9dabcc9a5d2294c5b15d/cfg/yolo-voc.cfg#L237
and train.
I didn't use IoU for distance, because for using distance = 1 - IoU(box, centroid)
I should implement k-means with my own hands, in this case I can not use the implementation from the OpenCV. If there is time, I will do it.
change num=9:
if we change the num
, then filters should be changed, isn't it? because filters = (classes + coords + 1)*num
@VanitarNordic Yes, of course, I corrected the answer above.
@AlexeyAB
Thank you. what about the custom dataset anchors calculation? I think it should be no difference with the original VOC, my only concern is background-only images with empty txt
file.
@VanitarNordic There is not problem with background-only images with empty txt file. I test it on the my dataset where half of images havn't objects, and it works well. Images without object just is skipped.
@AlexeyAB
I trained with 9 anchors. mAP remained intact but average IoU decreased about 2%. Still I don't know why original anchors work better because new ones create much better clustering (with whatever method).
below is the anchors:
beside, offline video processing using this command causes the darknet to crash,
darknet.exe detector demo data/obj.data yolo-obj.cfg backup/yolo-obj_final.weights test.mp4 -out_filename res.avi
it happens with the latest commit. The reason could be because of the new anchors or a different bug. I have an older commit which I download on 2018-02-24 which uses the original anchors. The video processing works there but The FPS is wrong and just swaps between two numbers, 32Fps or 64Fps. There are some weird character there also.
@VanitarNordic Thank you, I fixed it.
I trained with 9 anchors. mAP remained intact but average IoU decreased about 2%. Still I don't know why original anchors work better because new ones create much better clustering (with whatever method).
I think there should be more cleaner and representative tests, with a 20-80 number of classes with 2000 images per class, using the same 100% identical: source code, dataset and identical model (except different anchors)
@AlexeyAB
increasing the number of anchors will decrease the detection speed? if Yes, can you predict how much?
@VanitarNordic I think less than 1%.
@AlexeyAB How can i use gen_anchors.py and how to decide upon number of cluster on my custom dataset. My custom dataset contains 10 distinct objects which are non-overlapping in nature.