Open iGio90 opened 5 years ago
Since you were so kind to reply on the wrong issue, can you add to the copy pasted reply some lines about what's the right way (if needed) to determine the right number of anchors to generate?
@iGio90 I would start by reading PJ Redmon's original papers on Yolo, Yolo9000, and Yolov3.
As for the Yolo with five layers, I see you only have nine anchors. the mask
parameter in the layer refers to which anchors to use from the anchors
parameter in the following line. Since you have nine anchors but have mask values up to fifteen, your largest two layers, referring to anchors 10-15 (well, 9-14 since they start at 0) probably aren't doing anything. When you calculate anchors, you should use a multiple of five. Or you can have one layer's mask=8
then 7, then 6, then 4,5, then 2,3, then 0,1. Basically, you can choose any number of anchors, but you need to divide them appropriately between your yolo layer's masks.
Different yolo layers typically handle detections of different sizes, as long as your shortcut and convolution layers are set up properly. As the net goes through the convolutional layers, the image gets downsampled by a factor of 32 in the original yolo. I'm not sure if that's the same in your cfg with five layers, but you should figure it out because height
and width
should be multiples of that number.
When I use darknet.exe detector calc_anchors
I get float values, not integers.
@iGio90 Hi,
https://gist.github.com/iGio90/138800a70bef5e4e8e7b0fcef6814c34
num=15
is the number of anchors, but you set only 9 anchors (18 values)
Use default anchors in yolo with 5 yolo-layers.
In all repos anchors are used in the same way - these are inital object sizes.
2. learning_rate=0.001 (this field is very different on the various configs)
In all yolo-cfg file usually is used learning_rate=0.001
Read about learning rate: https://towardsdatascience.com/understanding-learning-rates-and-how-it-improves-performance-in-deep-learning-d0d4059c1c10
last one.. Im training now with the 5 layer configurations since 6 hours and still the object loss goes from >6.0 to <15.0 and i think there is an issue somewhere.... here are my configs if you mind to give a 10 seconds quick check and see if something is wrong:
Avg Loss very rough indicator.
Train with -map
flag - it is very important indicator. So mAP will be calculated for each 4 Epochs: https://github.com/AlexeyAB/darknet#when-should-i-stop-training
second question: by using the 5 layers one, i can only run on batch=32 and subdivision=32 if I play with those, or increase the width/height by 32, i face cuda out of memory (running a laptop with i9 and nvidia rtx2060 - ubuntu 18.10) every time. Instead, i can run with the tiny yolo with bach=64 subdivision=16 and width/height > 600
So you should find what is better for your dataset.
Full-model: 5 yolo layers: https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov3_5l.cfg - for small and big objects
Tiny-model: 3 yolo layers: https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov3-tiny_3l.cfg - for small objects
Spatial-full-model: 3 yolo layers: https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov3-spp.cfg - for medium objects
Hi and thanks for the great work. I'm exploring yolo and darknet since a week (not so much), but i tried to document myself about the following questions through the other issues and your awesome readme but some of them are really hard to understand (for me).
So this is my scenario, I started to train using the original darknet (not your fork) with the stock voc configuration edited to make it properly works and i got a weight of 0.08 which was not making me that happy mostly because of the short amount of images on my dataset, but result where acceptable... im still doing detection -> fix -> add to dataset to improve.
Those days i moved to your fork of darknet and read all the readme to see everything more it can do, most of in the part in which you give suggestion on how to improve the training. Im training a model to recognize some mixed big/small object on games so i toke and edit the 3 configs you are suggesting for mixed size items (yolo with 5 layers, tiny yolo and yolo with 3 spacial layers)
first question: i gisted them one by one to make comparison with the original configs in the attempt to understand what was different and most of, which one i should go an why? (Please, consider add some lines on the readme explaining the differences between those 3 configurations)... ok, one is with 5 layers, what pros/cons you will get? the one with 5 layers is with a smaller resolution instead, the tiny one is up to 600.
second question: by using the 5 layers one, i can only run on batch=32 and subdivision=32 if I play with those, or increase the width/height by 32, i face cuda out of memory (running a laptop with i9 and nvidia rtx2060 - ubuntu 18.10) every time. Instead, i can run with the tiny yolo with bach=64 subdivision=16 and width/height > 600
third question: which is not a question but an explaination. Can you give me some quick hints or even references online to read about what pro/cons you get by altering those fields in the configs:
anchors. what are those exactly and more important, is there any differece between how your fork read them from the original darknet one? I'm using labelImg tool to label them and the last field on the txt is a number which sometimes id < 0 and marked as bad result in the txt. Im using some other py code to generate anchors which generate floats instead of int anchors = 15.02,8.98, 20.12,10.68, 24.67,16.03, 29.68,10.46, 33.04,16.71, 37.80,22.61, 52.86,26.36, 54.83,34.33, 71.06,33.72 and the result is very similar to yours between... just figuring out exactly if this makes any difference
learning_rate=0.001 (this field is very different on the various configs)
steps=400000,450000 (this as well)
last one.. Im training now with the 5 layer configurations since 6 hours and still the object loss goes from >6.0 to <15.0 and i think there is an issue somewhere.... here are my configs if you mind to give a 10 seconds quick check and see if something is wrong:
https://gist.github.com/iGio90/138800a70bef5e4e8e7b0fcef6814c34
thanks <3