Clarification for some yolo configurations

iGio90 commented 5 years ago

Hi and thanks for the great work. I'm exploring yolo and darknet since a week (not so much), but i tried to document myself about the following questions through the other issues and your awesome readme but some of them are really hard to understand (for me).

So this is my scenario, I started to train using the original darknet (not your fork) with the stock voc configuration edited to make it properly works and i got a weight of 0.08 which was not making me that happy mostly because of the short amount of images on my dataset, but result where acceptable... im still doing detection -> fix -> add to dataset to improve.

Those days i moved to your fork of darknet and read all the readme to see everything more it can do, most of in the part in which you give suggestion on how to improve the training. Im training a model to recognize some mixed big/small object on games so i toke and edit the 3 configs you are suggesting for mixed size items (yolo with 5 layers, tiny yolo and yolo with 3 spacial layers)

first question: i gisted them one by one to make comparison with the original configs in the attempt to understand what was different and most of, which one i should go an why? (Please, consider add some lines on the readme explaining the differences between those 3 configurations)... ok, one is with 5 layers, what pros/cons you will get? the one with 5 layers is with a smaller resolution instead, the tiny one is up to 600.

second question: by using the 5 layers one, i can only run on batch=32 and subdivision=32 if I play with those, or increase the width/height by 32, i face cuda out of memory (running a laptop with i9 and nvidia rtx2060 - ubuntu 18.10) every time. Instead, i can run with the tiny yolo with bach=64 subdivision=16 and width/height > 600

third question: which is not a question but an explaination. Can you give me some quick hints or even references online to read about what pro/cons you get by altering those fields in the configs: 1) anchors. what are those exactly and more important, is there any differece between how your fork read them from the original darknet one? I'm using labelImg tool to label them and the last field on the txt is a number which sometimes id < 0 and marked as bad result in the txt. Im using some other py code to generate anchors which generate floats instead of int anchors = 15.02,8.98, 20.12,10.68, 24.67,16.03, 29.68,10.46, 33.04,16.71, 37.80,22.61, 52.86,26.36, 54.83,34.33, 71.06,33.72 and the result is very similar to yours between... just figuring out exactly if this makes any difference

2) learning_rate=0.001 (this field is very different on the various configs) 3) steps=400000,450000 (this as well)

last one.. Im training now with the 5 layer configurations since 6 hours and still the object loss goes from >6.0 to <15.0 and i think there is an issue somewhere.... here are my configs if you mind to give a 10 seconds quick check and see if something is wrong:

https://gist.github.com/iGio90/138800a70bef5e4e8e7b0fcef6814c34

thanks <3

iGio90 commented 5 years ago

Closed.. Just figured I posted on wrong repo. Opening one in darknet

AlexeyAB commented 5 years ago

@iGio90 Hi,

https://gist.github.com/iGio90/138800a70bef5e4e8e7b0fcef6814c34

num=15 is the number of anchors, but you set only 9 anchors (18 values) Use default anchors in yolo with 5 yolo-layers.

In all repos anchors are used in the same way - these are inital object sizes.

2. learning_rate=0.001 (this field is very different on the various configs)

In all yolo-cfg file usually is used learning_rate=0.001

Read about learning rate: https://towardsdatascience.com/understanding-learning-rates-and-how-it-improves-performance-in-deep-learning-d0d4059c1c10

last one.. Im training now with the 5 layer configurations since 6 hours and still the object loss goes from >6.0 to <15.0 and i think there is an issue somewhere.... here are my configs if you mind to give a 10 seconds quick check and see if something is wrong:

Avg Loss very rough indicator. Train with -map flag - it is very important indicator. So mAP will be calculated for each 4 Epochs: https://github.com/AlexeyAB/darknet#when-should-i-stop-training

second question: by using the 5 layers one, i can only run on batch=32 and subdivision=32 if I play with those, or increase the width/height by 32, i face cuda out of memory (running a laptop with i9 and nvidia rtx2060 - ubuntu 18.10) every time. Instead, i can run with the tiny yolo with bach=64 subdivision=16 and width/height > 600

So you should find what is better for your dataset.

Full-model: 5 yolo layers: https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov3_5l.cfg - for small and big objects

Tiny-model: 3 yolo layers: https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov3-tiny_3l.cfg - for small objects

Spatial-full-model: 3 yolo layers: https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov3-spp.cfg - for medium objects

iGio90 commented 5 years ago

I opened this in darknet repo, copy paste there so maybe people will find it useful

AlexeyAB / Yolo_mark

Clarification for some yolo configurations #119