AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.75k stars 7.96k forks source link

explanation on cfg file parameters #279

Open anandkoirala opened 6 years ago

anandkoirala commented 6 years ago

Hi @AlexeyAB
could you please kindly document or explain parameters of the .cfg file

  1. saturation, exposure and hue values
  2. steps and scales values
  3. anchors, bias_match
  4. jitter, rescore, thresh
  5. object_scale, noobject_scale, class_scale, coord_scale values
  6. absolute
AlexeyAB commented 6 years ago

Hi,

  1. saturation, exposure and hue values - ranges for random changes of colours of images during training (params for data augumentation), in terms of HSV: https://en.wikipedia.org/wiki/HSL_and_HSV The larger the value, the more invariance would neural network to change of lighting and color of the objects.

  2. steps and scales values - steps is a checkpoints (number of itarations) at which scales will be applied, scales is a coefficients at which learning_rate will be multipled at this checkpoints. Determines how the learning_rate will be changed during increasing number of iterations during training.

  3. anchors, bias_match anchors are frequent initial <width,height> of objects in terms of output network resolution. bias_match used only for training, if bias_match=1 then detected object will have <width,height> the same as in one of anchor, else if bias_match=0 then <width,height> of anchor will be refined by a neural network: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L275-L283 If you train with height=416,width=416,random=0, then max values of anchors will be 13,13. But if you train with random=1, then max input resolution can be 608x608, and max values of anchors can be 19,19.

  4. jitter, rescore, thresh jitter can be [0-1] and used to crop images during training for data augumentation. The larger the value of jitter, the more invariance would neural network to change of size and aspect ratio of the objects: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/data.c#L513-L528

rescore determines what the loss (delta, cost, ...) function will be used - more about this: https://github.com/AlexeyAB/darknet/issues/185#issuecomment-334504558 https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L302-L305

thresh is a minimum IoU when should be used delta_region_class() during training: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L235


  1. object_scale, noobject_scale, class_scale, coord_scale values - all used for training

  2. absolute - isn't used

szm-R commented 6 years ago

Hi @AlexeyAB, I didn't get what object_scale does in the link you mentioned (#185) and how to set it. Well, to be honest, I don't have much of a clue about the other ones either (noobject_scale, class_scale, coord_scale) but I have a feeling that this odject_scale parameter is more important!!! Is it in any way related to other parameters such as the number of classes, etc?

dfsaw commented 6 years ago

How to change the no of iterations after which weights are created

IlyaOvodov commented 6 years ago

If you mean period on 100 iterations to write snapshot, it is hardcoded here: https://github.com/AlexeyAB/darknet/blob/master/src/detector.c#L202 If you mean ultimate no of iterations (when training is stopped and blabla_final.weights is created - it is "max_batches" parameter.

dfsaw commented 6 years ago

@IlyaOvodov In the link mentioned above which line should i change.. Suppose i want to look at weights after 10 iterations.

dfsaw commented 6 years ago

What does [route] layers=-9

[reorg] stride=2 and

[route] layers=-1,-4 means?

Can anyone please help me out?

AlexeyAB commented 6 years ago

@dfsaw

  1. [route] layer - is the same as Concat-layer in the Caffe layers=-1, -4 means that will be concatenated two layers, with relative indexies -1 and -4

  2. [reorg] layer - just reshapes feature map - decreases size and increases number of channels, without changing elements. stride=2 mean that width and height will be decreased by 2 times, and number of channels will be increased by 2x2 = 4 times, so the total number of element will still the same: width_old*height_old*channels_old = width_new*height_new*channels_new


For example:

yolo_voc 2 0

dfsaw commented 6 years ago

If I have 7 classes, should I change classes only in yolo-obj.cfg? Are there any other files where i should be changing?

AlexeyAB commented 6 years ago

@dfsaw Change classes= in 3 [yolo]-layers and filters= in 3 [convolutional]-layers.

Read carefully: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

  1. Create file obj.data in the directory build\darknet\x64\data\, containing (where classes = number of objects):
    classes= 2
    train  = data/train.txt
    valid  = data/test.txt
    names = data/obj.names
    backup = backup/
dfsaw commented 6 years ago

@AlexeyAB I am to predict images using weights created after 2nd iteration..but after that none of the weights are predicting anything.Can someone please help me out

hemp110 commented 6 years ago

@AlexeyAB In yolo_layer.c there is if (best_iou > l.truth_thresh) but in cfg the yolo layer has

ignore_thresh = .7
truth_thresh = 1
  1. so the if sentence will never reach?
  2. and could i say "if the best iou of one object > ignore_thresh, then yolo take it as detected and its loss will be ingored"?
AlexeyAB commented 6 years ago

@hemp110 Currently it will never reach. It is just for experiments.

and could i say "if the best iou of one object > ignore_thresh, then yolo take it as detected and its loss will be ingored"?

Yes, then objectness will not be decreased.

Eyshika commented 6 years ago

@AlexeyAB Just for confirmation In yolov3.cfg the width and height is image width and height or bounding boxes ? According to my knowledge it must be image width and height since bounding box dimensions changes in every image we use for training,

also here https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects you mentioned batches=64 but shouldn't it be dependent on number of training images be are using ? I read it somewhere that batches are actually number of training images so just confirming

AlexeyAB commented 6 years ago

@Eyshika

@AlexeyAB Just for confirmation In yolov3.cfg the width and height is image width and height or bounding boxes ? According to my knowledge it must be image width and height since bounding box dimensions changes in every image we use for training,

This is neither image width nor bounding boxes width.

width= height= in the yolov3.cfg is the size of neural network. Any image will be automatically resized to this size (width height) during training or detection. Only after that the resized image will be passed to the neural network.

Eyshika commented 6 years ago

@AlexeyAB so during testing will it result with original size with bounding boxes at correct position ?

AlexeyAB commented 6 years ago

@Eyshika Yes. All these things are automatic and always correct.

MarquiseRosier commented 6 years ago

Hey @AlexeyAB does route layer...copy output of some prior layer or does it simply reference output weights? If so...does the momentum based gradient optimization update a copy of the weights or the original?

AlexeyAB commented 6 years ago

@MarquiseRosier

Original weights will be updated. Delta will summed of route delta + current layer delta Route layer updates original weights: https://github.com/AlexeyAB/darknet/blob/6682f0b98984e2b92049e985b21ed81b76666565/src/route_layer.c#L123-L131

MarquiseRosier commented 6 years ago

@AlexeyAB You are amazing! Thank you :)

weiting0032 commented 6 years ago

Hey @AlexeyAB I have some yolo problems to ask you. I use yolov3-voc to train the car plate images. And the training images size are 4192*3264, and training cfg height and width I set 416 416. After training, I take the training images for testing, and it can detect the label I have trained. However, when I take the Panorama images and its sizes are 8192*4096, and I found that it cant detect any car plate labels in the images. I want to ask you what the problems are happening. Sorry for bother you to help me solve the problems. Thank you!

AlexeyAB commented 6 years ago

@WEITINGLIN32

It seems this rule is broken: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

General rule - your training dataset should include such a set of relative sizes of objects that you want to detect:

train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width

In your case you should change network resolution fater training.

What is the average size of objects

Then calculate new width= in cfg-file: detection_network_width =

train_network_width * train_obj_width / train_image_width / (detection_obj_width / detection_image_width) =

416 * average_train_obj_width / 4192 / (average_detection_obj_width / 8192) = ???

weiting0032 commented 6 years ago

#

@WEITINGLIN32

It seems this rule is broken: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

General rule - your training dataset should include such a set of relative sizes of objects that you want to detect: train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width

In your case you should change network resolution fater training.

What is the average size of objects

  • in Training dataset?
  • in Detection dataset?

Then calculate new width= in cfg-file: detection_network_width =

train_network_width * train_obj_width / train_image_width / (detection_obj_width / detection_image_width) =

416 * average_train_obj_width / 4192 / (average_detection_obj_width / 8192) = ???

Hello, @AlexeyAB, Now I have to edit is cfg-file width and height? And I want to ask how can I calculate the average_train_obj_width and average_detection_obj_width? Meanwhile, whether I have to retrain my model or not. If not, how should I do to detect panorama images. Thanks a lot!

AlexeyAB commented 6 years ago

@WEITINGLIN32

Now I have to edit is cfg-file width and height?

Yes.

And I want to ask how can I calculate the average_train_obj_width and average_detection_obj_width?

The simplest way to get average width of object is to calculate 1 anchor that will not be used in cfg-file: ./darknet detector calc_anchors data/obj.data -num_of_clusters 1 -width 416 -height 416

I.e. calculate 1 anchor

Sudhakar17 commented 5 years ago

Please explain the parameters in classifier cfg file: @anandkoirala @AlexeyAB [softmax] groups=1 ---- ? temperature=3 ------?

ashnaeldho commented 5 years ago

Hi @AlexeyAB Could you please explain how cnn detects the bounding box coordinates, objectness score and probability of the object in yolo.

oscarzasa commented 5 years ago

Hi @AlexeyAB,

I know you already explained how some of the YOLO layer parameters work, but there are some of them that I'm missing. Can you please explain the following? mask anchors num ignore_thresh truth_thresh random

I'll really appreciate the help! Cheers.

AlexeyAB commented 5 years ago

@oscarzasa Read wiki:

https://github.com/AlexeyAB/darknet/wiki/CFG-Parameters-in-the-%5Bnet%5D-section

https://github.com/AlexeyAB/darknet/wiki/CFG-Parameters-in-the-different-layers

oscarzasa commented 5 years ago

@AlexeyAB thanks a lot! Cheers!

Madankumar90 commented 4 years ago

Hey @AlexeyAB does route layer work for concating three layers, something like this: [route] layers=-1,-4,-3. Can this be performed?? If not how to concate three different layers??

Thanks, Madan

AlexeyAB commented 4 years ago

@Madankumar90 [route] - concatenation layer, Concat for several input-layers, or Identity for one input-layer

More: https://github.com/AlexeyAB/darknet/wiki/CFG-Parameters-in-the-different-layers

Madankumar90 commented 4 years ago

yeah i saw this late. Anyways, Thank you for the quick reply.

stephanecharette commented 4 years ago

I understand the range for hue is 0.0 to 1.0. But what about saturation and exposure? I assume that setting both of these to 0.0 turns it off, but what is the upper range?

AlexeyAB commented 4 years ago

@stephanecharette default values

hue=0
exposure=1
saturation=1

hue=0.3 - means hue from -0.3 to +0.3 exposure=1.5 - means exposure from 1/1.5 to 1*1.15 saturation=1.5 - means saturation from 1/1.5 to 1*1.15

how it will be calculated: https://github.com/AlexeyAB/darknet/blob/2116cba1ed123b38b432d7d8b9a2c235372fd553/src/data.c#L1017-L1019

how it will be applied: https://github.com/AlexeyAB/darknet/blob/2116cba1ed123b38b432d7d8b9a2c235372fd553/src/image_opencv.cpp#L1183-L1198

stephanecharette commented 4 years ago

Thanks, @AlexeyAB. I don't think I explained my question very well. I'm trying to figure out what is the maximum range someone can set for these three values in the .cfg file.

AlexeyAB commented 4 years ago

Hue from 0 to 1.0 Saturation from 0.003 to 256 Exposure from 0.003 to 256

stephanecharette commented 4 years ago

Hue from 0 to 1.0 Saturation from 0.003 to 256 Exposure from 0.003 to 256

I found 256 is extreme and probably unusable for most images. If anyone else in the future is reading this thread, you can see the results of modifying hue, saturation, and exposure here, with examples of what images look like when applying different values for hue, saturation, and exposure: https://www.ccoderun.ca/DarkMark/DataAugmentationColour.html

TheBarnakhil commented 4 years ago

Hey @AlexeyAB I'm sorry but can you explain the output parameters? Namely iou_norm, class_norm, cls_norm, scale_x_y from, iou_loss , mse?

[yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00

And BF from 96 conv 128 1 x 1/ 1 26 x 26 x 256 -> 26 x 26 x 128 0.044 BF

ZhixinLai commented 3 years ago

@stephanecharette default values

hue=0
exposure=1
saturation=1

hue=0.3 - means hue from -0.3 to +0.3 exposure=1.5 - means exposure from 1/1.5 to 1*1.15 saturation=1.5 - means saturation from 1/1.5 to 1*1.15

how it will be calculated:

https://github.com/AlexeyAB/darknet/blob/2116cba1ed123b38b432d7d8b9a2c235372fd553/src/data.c#L1017-L1019

how it will be applied:

https://github.com/AlexeyAB/darknet/blob/2116cba1ed123b38b432d7d8b9a2c235372fd553/src/image_opencv.cpp#L1183-L1198

is this should be? hue=0 exposure=1 saturation=1 hue=0.3 - means hue from -0.3 to +0.3 exposure=1.5 - means exposure from 1/1.5 to 1 1.5 (not 1.15) saturation=1.5 - means saturation from 1/1.5 to 1 1.5 (not 1.15)