AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.8k stars 7.97k forks source link

howto: calculating custom anchors for YOLOv4-tiny #7856

Closed stephanecharette closed 3 years ago

stephanecharette commented 3 years ago

@AlexeyAB I know about this line in the readme:

Only if you are an expert in neural detection networks [...]

I've avoided the topic of re-calculating anchors for the past few years. But people ask on the Discord server, and truth is, I'd like to know how to do it as well! :) Every time I try to do it, the results are worse than the default anchors, so I assume that I'm doing it wrong and I'm not enough of an expert.

Say we use this license plate project as an example: https://github.com/stephanecharette/DarkPlate

The default anchors in YOLOv4-tiny is this:

anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319

I know the anchor-calculating code has a bit of randomness in it, so every time I run it I get slightly different results. For my 416x416 YOLOv4-tiny config file, I run this command:

darknet detector calc_anchors DarkPlate.data -num_of_clusters 6 -width 416 -height 416

The results I get look like one of these lines:

anchors =  12, 19,  24, 38,  42, 76,  98, 39, 154, 91, 256,155
anchors =  12, 19,  24, 39,  42, 76,  94, 37, 152, 84, 240,152
anchors =  12, 19,  24, 37,  41, 75,  95, 38, 148, 85, 245,151
anchors =  11, 19,  26, 34,  31, 62,  53, 83, 128, 53, 214,132
anchors =  10, 18,  23, 32,  30, 59,  56, 72, 144, 61, 222,140

First thing I do is pick one of the anchor lines I list above. (They're all very similar, off by just a few pixels.)

Let say we use this line for our example: anchors = 12, 19, 24, 37, 41, 75, 95, 38, 148, 85, 245, 151

Then I look for each [yolo] section in the .cfg file and replace the anchors = ... line with one we selected above.

Lastly comes these instructions:

But you should change indexes of anchors masks= for each [yolo]-layer, so for YOLOv4 the 1st-[yolo]-layer has anchors smaller than 30x30, 2nd smaller than 60x60, 3rd remaining, and vice versa for YOLOv3.

Considering the default anchors are anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319, the default YOLOv4-tiny.cfg has 2 YOLO sections with these masks and anchors:

Lines 226-228:

[yolo]
mask = 3,4,5
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319

And lines 277-279:

[yolo]
mask = 1,2,3
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319

Are the anchors zero-based, or one-based? I assume zero-based, so line 227 refers to:

And line 278 refers to:

Is it intentional that mask index #3 (81,82) is referenced in both YOLO sections, or is that a typo? Should the mask be 1,2,3 and 4,5,6 or 0,1,2 and 3,4,5?

And just as importantly, how do we reconcile this statement:

so for YOLOv4 the 1st-[yolo]-layer has anchors smaller than 30x30, 2nd smaller than 60x60, 3rd remaining, and vice versa for YOLOv3.

From what I can see, the 1st [YOLO] section has anchors 81,82, 135,169, and 344,319, all of which are larger than 30x30, not smaller.

And even in the 2nd [YOLO] section, only the very first anchor of 23,27 would be smaller than 30x30, so I'm very confused.

But even without understanding all of this, I went ahead and trained 2 networks, one with the default anchors and the other with some new custom anchors. I did not change the mask, only the anchors = ... line. (See attached .cfg file.)

This is the chart.png when I train with the default YOLOv4-tiny anchors:

image

And this is the chart.png file when I use the custom anchors:

image

Can you help clear up the confusion and various questions?

DarkPlate.cfg.txt

AlexeyAB commented 3 years ago

@stephanecharette Hi,

I know the anchor-calculating code has a bit of randomness in it, so every time I run it I get slightly different results.

Yes, there is random initialization in the k-means++ approach https://en.wikipedia.org/wiki/K-means%2B%2B

When you run command darknet detector calc_anchors DarkPlate.data -num_of_clusters 6 -width 416 -height 416

You see:

So you can run it several times and choose the anchors with the highest IoU.


darknet detector calc_anchors DarkPlate.data -num_of_clusters 6 -width 416 -height 416

Try to use this command with flag -show (you should do it on OS with GUI: Windows / Linux+Goneme/KDE/... ) darknet detector calc_anchors DarkPlate.data -num_of_clusters 6 -width 416 -height 416 -show

Check that the anchors cover most of the points evenly. If not, then


Are the anchors zero-based, or one-based? I assume zero-based

Yes, masks of anchors are 0-based.

So try

[yolo]
mask = 0,1,2

instead of

[yolo]
mask = 1,2,3

It was a mistake in yolov3-tiny and yolov4-tiny versions. Which we shouldn't fix in the default models, beacause we ahve many of pre-trained models with these masks.


And just as importantly, how do we reconcile this statement:

so for YOLOv4 the 1st-[yolo]-layer has anchors smaller than 30x30, 2nd smaller than 60x60, 3rd remaining, and vice versa for YOLOv3.

From what I can see, the 1st [YOLO] section has anchors 81,82, 135,169, and 344,319, all of which are larger than 30x30, not smaller.

And even in the 2nd [YOLO] section, only the very first anchor of 23,27 would be smaller than 30x30, so I'm very confused.

Actually you should use:

The order of the [yolo] layers is different in different models.

image

image


AlexeyAB commented 3 years ago

It is not related to anchors, but it can slightly improve accuracy if you use pre-trained weights for training. You can try to add line stopbackward=800 to https://github.com/AlexeyAB/darknet/blob/005513a9db14878579adfbb61083962c99bb0a89/cfg/yolov4-tiny.cfg#L198 So it will freeze all layers before this one for the first 800 iterations, so randomly initialized layers will not produce random gradient and will not destroy information in the pre-trained weights. After 800 iterations, these layers will already be trained and will not contain random weights.


Or for very large models you can try to train yolov4-p5-frozen.cfg with stopbackward=1 https://github.com/AlexeyAB/darknet/blob/005513a9db14878579adfbb61083962c99bb0a89/cfg/yolov4-p5-frozen.cfg#L1698 with pre trained weigths https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-p5.conv.232

In this case stopbackward=1 will freeze all previous layers throughout the whole training, so training will be 2x-3x faster, will consume less memory so you can fit large mini-batch (lower subdivisions= in cfg) or higher resolution in GPU. More details about large models: https://github.com/AlexeyAB/darknet/issues/7838#issue-930834186

2MinuteWarning commented 3 years ago

@AlexeyAB - thanks for posting these details. I too am trying to customize anchors for a custom yolov4-tiny model.

I'm still a little confused... what should @stephanecharette set the masks to for lines 227 and 278 above?

arnaud-nt2i commented 3 years ago

@AlexeyAB sorry but you have not answered to the most critical (and misterious) part of Stephane question:

And just as importantly, how do we reconcile this statement:

so for YOLOv4 the 1st-[yolo]-layer has anchors smaller than 30x30, 2nd smaller than 60x60, 3rd remaining, and vice versa for YOLOv3.

From what I can see, the 1st [YOLO] section has anchors 81,82, 135,169, and 344,319, all of which are larger than 30x30, not > > smaller. And even in the 2nd [YOLO] section, only the very first anchor of 23,27 would be smaller than 30x30, so I'm very confused.

Yolo doesn't respect his own rules?

Another question, does changing the network size affect the "theoretical" 30x30 and 60x60 limits?

AlexeyAB commented 3 years ago

@arnaud-nt2i

Yolo doesn't respect his own rules?

There is detailed answer for this question: https://github.com/AlexeyAB/darknet/issues/7856#issuecomment-874147909

Actually you should use:


Another question, does changing the network size affect the "theoretical" 30x30 and 60x60 limits?

In general yes. But you also should pay attention to rewritten_box values during training, if it is higher than >5%, then try to move more anchors (actually move masks) from [yolo] layer with low resolution to [yolo] layer with high resolution

AlexeyAB commented 3 years ago

Another question, does changing the network size affect the "theoretical" 30x30 and 60x60 limits?

Try to keep this rule. If you change the network resize - then recalculate anchors for the new network size

arnaud-nt2i commented 3 years ago

ok thank you for those explanations, It's the first time I read about the rewritten_box values in relation to Anchors... I have read all anchors issues since 2018 but never seen an as clear explanation. I will try that and change mask number and filter if needed. One more thing, In all my tries, with small and big dataset (up to 300 000 pics): 1) SGDR works (way) better stan steps, 2) Batch_normalyse=2 better (little bit) than 1 (for minibatch from 2 to 5) 3) I never had problems with dynamic_minibatch=2 and 0.9 factor instead of 0.8.

AlexeyAB commented 3 years ago

@arnaud-nt2i

So is this combination the best [convolutional] batch_normalize=2 + [net] dynamic_minibatch=1 policy=sgdr for your dataset? Is your dataset indoor/outdoor/..., urban/agronomic/biology...?

Did you try [net] letter_box=1 and/or [net] ema_alpha=0.9998 ?

And new cfg-file/pre-trained weights: yolov4-csp-x-swish.cfg, yolov4-p5.cfg, yolov4-p6.cfg https://github.com/AlexeyAB/darknet#pre-trained-models


I never had problems with dynamic_minibatch=2 and 0.9 factor instead of 0.8.

Does it solve Out of memory issue, or does it increase accuracy? int new_dim_b = (int)(dim_b * 0.9); instead of https://github.com/AlexeyAB/darknet/blob/d669680879f72e58a5bc4d8de98c2e3c0aab0b62/src/detector.c#L216

arnaud-nt2i commented 3 years ago

@AlexeyAB So is this combination the best [convolutional] batch_normalize=2 + [net] dynamic_minibatch=1 policy=sgdr for your dataset? Yes for small agro/bio and big outside, with swish and sgdr_cycle = nb iter for 1 epoch and Cycle factor =2 Haven't tried letter box because I compute mean ratio of pics and set network size with the same ratio (ex 704*544) Don't know about ema_alpha=0.9998 ... what is that ?

int new_dim_b = (int)(dim_b * 0.9) alow higher minibatch and faster/more accurate learning... never had Out of memory with this (and I am always maximizing VRAM usage) playing with resolution and random coef on my 3090 and 1660Ti; (Haven't tried new optimized_memory as well... because I am afraid of memory usage peak when launching the network like it uses to be the case with optimized_memory=1

I tried yolov4 csp / scaled in december/january but it was not yet ok... now, I am desperately waiting for Opencv DNN to support It.... But I am more interested by AP50 (number of detected objects) than AP (bbox coordinates) and I need good accuracy for small objects so new yolov4 (Csp, scaled) might be worse than better for me...

AlexeyAB commented 3 years ago

@arnaud-nt2i

Don't know about ema_alpha=0.9998 ... what is that ?

EMA is a custom version of SWA https://pytorch.org/blog/pytorch-1.6-now-includes-stochastic-weight-averaging/

Regardless of the procedure you use to train your neural network, you can likely achieve significantly better generalization at virtually no additional cost with a simple new technique now natively supported in PyTorch 1.6, Stochastic Weight Averaging (SWA) ... Averaged SGD is often used in conjunction with a decaying learning rate, and an exponential moving average (EMA), typically for convex optimization. In convex optimization, the focus has been on improved rates of convergence.


You can try to train this model: https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4-csp-x-swish.cfg with this pre-trained weights https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-csp-x-swish.conv.192 from https://github.com/AlexeyAB/darknet#pre-trained-models

arnaud-nt2i commented 3 years ago

ok, will try ema_alpha=09998 in some of my next training... But as for optimizer, Radam + Lookahead =Ranger seems to give higher gain ( and as importantly less sensibility to initial lr) : https://lessw.medium.com/new-deep-learning-optimizer-ranger-synergistic-combination-of-radam-lookahead-for-the-best-of-2dc83f79a48d But it is already in todo list ^^

AlexeyAB commented 3 years ago

@arnaud-nt2i For the most of my experiments and other papers:

arnaud-nt2i commented 3 years ago

ok that seems fair, nothing replaces old good long trainings !

cpsu00 commented 3 years ago

Hi @arnaud-nt2i

  1. Batch_normalyse=2 better (little bit) than 1 (for minibatch from 2 to 5)

Isn't Batch_normalise should be either 0 or 1?

arnaud-nt2i commented 3 years ago

@cpsu00 batch_normalyse=1 is default param for yolov4, batch_normalyse=0 mean no normalization (not really good) batch_normalyse=2 in Cross batch norm, allow use of lower minibatch_size (upgrade of batch_normalyse)

see batchnorm_layer.c

cpsu00 commented 3 years ago

Oh, I didn't notice that. Thanks!

zxz-cc commented 3 years ago

@stephanecharette Hi,

I know the anchor-calculating code has a bit of randomness in it, so every time I run it I get slightly different results.

Yes, there is random initialization in the k-means++ approach https://en.wikipedia.org/wiki/K-means%2B%2B

When you run command darknet detector calc_anchors DarkPlate.data -num_of_clusters 6 -width 416 -height 416

You see:

* anchors

* IoU

So you can run it several times and choose the anchors with the highest IoU.

darknet detector calc_anchors DarkPlate.data -num_of_clusters 6 -width 416 -height 416

Try to use this command with flag -show (you should do it on OS with GUI: Windows / Linux+Goneme/KDE/... ) darknet detector calc_anchors DarkPlate.data -num_of_clusters 6 -width 416 -height 416 -show

* You will see the point cloud, where each pixel is a relative size of object in training dataset (x,y-coord of pixel in cloud === w,h-size of object in training dataset)

* And you will see anchors, they look as bounded boxes with left-top coord at the (0,0).

Check that the anchors cover most of the points evenly. If not, then

* or try to add some additional anchors manually

* or try to use more anchors, f.e. 9: `-num_of_clusters 9`

Are the anchors zero-based, or one-based? I assume zero-based

Yes, masks of anchors are 0-based.

So try

[yolo]
mask = 0,1,2

instead of

[yolo]
mask = 1,2,3

It was a mistake in yolov3-tiny and yolov4-tiny versions. Which we shouldn't fix in the default models, beacause we ahve many of pre-trained models with these masks.

And just as importantly, how do we reconcile this statement:

so for YOLOv4 the 1st-[yolo]-layer has anchors smaller than 30x30, 2nd smaller than 60x60, 3rd remaining, and vice versa for YOLOv3.

From what I can see, the 1st [YOLO] section has anchors 81,82, 135,169, and 344,319, all of which are larger than 30x30, not smaller. And even in the 2nd [YOLO] section, only the very first anchor of 23,27 would be smaller than 30x30, so I'm very confused.

Actually you should use:

* small anchors for [yolo] layer with high resolution

* big anchors for [yolo] layer with low resolution

The order of the [yolo] layers is different in different models.

image

image

Does that mean maks won't change if I use a pre-trained model? If I change it to [3,4,5], [0,1,2], I can't use the pre-trained model. Okay?

stephanecharette commented 3 years ago

Of course, do not change the masks or anchors on pre-trained models! This only works if you are training your own custom network.

zxz-cc commented 3 years ago

Of course, do not change the masks or anchors on pre-trained models! This only works if you are training your own custom network.

ok thank you !

MrGolden1 commented 3 years ago

How should I change the .cfg file to increase the number of anchor clusters? For YOLOv4-tiny default value is 6, and I want to try a higher number to test mAP. Any guidance will be appreciated.

Fetulhak commented 2 years ago

@stephanecharette @AlexeyAB I have trained two Yolov4 models. one using resolution 416x 416 and the other 512x512. however, the model with 512x512 has a lower mAP than the 416x416. It is confusing for me that it should be the opposite? the input images were all equal size 1008 x 1008. any help will be appreciated.

anchor generated at 416: 10, 10, 18, 8, 8, 18, 12, 12, 14, 14, 16, 15, 18, 18, 21, 21, 25, 25 ...............90.52% IOU

Fetulhak commented 2 years ago

How should I change the .cfg file to increase the number of anchor clusters? For YOLOv4-tiny default value is 6, and I want to try a higher number to test mAP. Any guidance will be appreciated.

Hi @MrGolden1 if you want to increase the anchor box size you should also increase the detection heads of yolo. see Yolov4-tiny-3l.cfg

Fetulhak commented 2 years ago

@stephanecharette @AlexeyAB the other question I have is what if I forget the resolution I used for training but I have the weight file, is there any way to know the training time resolution from the weight file?

stephanecharette commented 2 years ago

The image size is irrelevant and ignored by Darknet. The only size that matters is the width and height in the cfg. Your images could be 999999x999999 and Darknet will still resize the images to match the network dimensions.

Fetulhak commented 2 years ago

Thanks for your reply @stephanecharette okay let's forget about image size. whatever image size I have given, isn't my model trained with network resolution (width and height in the cfg) size 512x512 give better results than 416x416? in my case the 512 x 512 .cfg gives very low mAP than the 416 x 416. I am very confused

darkxzk commented 2 years ago

I think you're right. recalculating the anchor frame will result in a decrease in the overall accuracy of the model. My experiment is consistent with yours, and the default anchor works best. Although I do not know why this is the case, it is possible from my analysis that the anchors of K-means + + clustering do not cover all scales, most likely because the target box in the dataset is fixed in one scale.