Open anandkoirala opened 6 years ago
Hi,
saturation, exposure and hue values - ranges for random changes of colours of images during training (params for data augumentation), in terms of HSV: https://en.wikipedia.org/wiki/HSL_and_HSV The larger the value, the more invariance would neural network to change of lighting and color of the objects.
steps and scales values - steps
is a checkpoints (number of itarations) at which scales
will be applied, scales
is a coefficients at which learning_rate
will be multipled at this checkpoints.
Determines how the learning_rate will be changed during increasing number of iterations during training.
anchors, bias_match
anchors
are frequent initial <width,height> of objects in terms of output network resolution.
bias_match
used only for training, if bias_match=1 then detected object will have <width,height> the same as in one of anchor, else if bias_match=0 then <width,height> of anchor will be refined by a neural network: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L275-L283
If you train with height=416,width=416,random=0
, then max values of anchors will be 13,13
.
But if you train with random=1
, then max input resolution can be 608x608, and max values of anchors can be 19,19
.
jitter, rescore, thresh
jitter
can be [0-1] and used to crop images during training for data augumentation. The larger the value of jitter, the more invariance would neural network to change of size and aspect ratio of the objects: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/data.c#L513-L528
rescore
determines what the loss (delta, cost, ...) function will be used - more about this: https://github.com/AlexeyAB/darknet/issues/185#issuecomment-334504558
https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L302-L305
thresh
is a minimum IoU when should be used delta_region_class()
during training: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L235
object_scale, noobject_scale, class_scale, coord_scale values - all used for training
delta_region_class()
: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L108delta_region_box()
: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/src/region_layer.c#L87absolute - isn't used
Hi @AlexeyAB, I didn't get what object_scale does in the link you mentioned (#185) and how to set it. Well, to be honest, I don't have much of a clue about the other ones either (noobject_scale, class_scale, coord_scale) but I have a feeling that this odject_scale parameter is more important!!! Is it in any way related to other parameters such as the number of classes, etc?
How to change the no of iterations after which weights are created
If you mean period on 100 iterations to write snapshot, it is hardcoded here: https://github.com/AlexeyAB/darknet/blob/master/src/detector.c#L202 If you mean ultimate no of iterations (when training is stopped and blabla_final.weights is created - it is "max_batches" parameter.
@IlyaOvodov In the link mentioned above which line should i change.. Suppose i want to look at weights after 10 iterations.
What does [route] layers=-9
[reorg] stride=2 and
[route] layers=-1,-4 means?
Can anyone please help me out?
@dfsaw
[route]
layer - is the same as Concat
-layer in the Caffe
layers=-1, -4
means that will be concatenated two layers, with relative indexies -1 and -4
[reorg]
layer - just reshapes feature map - decreases size and increases number of channels, without changing elements.
stride=2
mean that width and height will be decreased by 2 times, and number of channels will be increased by 2x2 = 4 times, so the total number of element will still the same:
width_old*height_old*channels_old = width_new*height_new*channels_new
For example:
If we use [route] layers=-1
, we simply takes as input the result of the preceding layer (current_layer_number-1), without any processing.
If we use [route] layers=-2
, we takes as input the result of the layer with index = (current_layer_number-2), without any processing.
If we use [route] layers= -1, -3
, we takes as input the result of the layers with indexes = (current_layer_number-1)
and (current_layer_number-3)
, and merge them into one layer
If at layer-27 we have [route] layers= -1, -3
, then it will take two layers 26=(27-1)
and 24=(27-3)
, and merge its in depth: 13x13x1024 + 13x13x2048
= 13x13x3072
- is output of layer-27.
If I have 7 classes, should I change classes only in yolo-obj.cfg? Are there any other files where i should be changing?
@dfsaw Change classes=
in 3 [yolo]-layers and filters=
in 3 [convolutional]-layers.
Read carefully: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects
- Create file obj.data in the directory build\darknet\x64\data\, containing (where classes = number of objects):
classes= 2 train = data/train.txt valid = data/test.txt names = data/obj.names backup = backup/
@AlexeyAB I am to predict images using weights created after 2nd iteration..but after that none of the weights are predicting anything.Can someone please help me out
@AlexeyAB In yolo_layer.c there is
if (best_iou > l.truth_thresh)
but in cfg the yolo layer has
ignore_thresh = .7
truth_thresh = 1
@hemp110 Currently it will never reach. It is just for experiments.
and could i say "if the best iou of one object > ignore_thresh, then yolo take it as detected and its loss will be ingored"?
Yes, then objectness
will not be decreased.
@AlexeyAB Just for confirmation In yolov3.cfg the width and height is image width and height or bounding boxes ? According to my knowledge it must be image width and height since bounding box dimensions changes in every image we use for training,
also here https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects you mentioned batches=64 but shouldn't it be dependent on number of training images be are using ? I read it somewhere that batches are actually number of training images so just confirming
@Eyshika
@AlexeyAB Just for confirmation In yolov3.cfg the width and height is image width and height or bounding boxes ? According to my knowledge it must be image width and height since bounding box dimensions changes in every image we use for training,
This is neither image width nor bounding boxes width.
width= height=
in the yolov3.cfg
is the size of neural network. Any image will be automatically resized to this size (width height) during training or detection. Only after that the resized image will be passed to the neural network.
@AlexeyAB so during testing will it result with original size with bounding boxes at correct position ?
@Eyshika Yes. All these things are automatic and always correct.
Hey @AlexeyAB does route layer...copy output of some prior layer or does it simply reference output weights? If so...does the momentum based gradient optimization update a copy of the weights or the original?
@MarquiseRosier
Original weights will be updated. Delta will summed of route delta
+ current layer delta
Route layer updates original weights: https://github.com/AlexeyAB/darknet/blob/6682f0b98984e2b92049e985b21ed81b76666565/src/route_layer.c#L123-L131
@AlexeyAB You are amazing! Thank you :)
Hey @AlexeyAB I have some yolo problems to ask you. I use yolov3-voc
to train the car plate images. And the training images size are 4192*3264
, and training cfg height and width I set 416 416
. After training, I take the training images for testing, and it can detect the label I have trained. However, when I take the Panorama images and its sizes are 8192*4096
, and I found that it cant detect any car plate labels in the images. I want to ask you what the problems are happening. Sorry for bother you to help me solve the problems. Thank you!
@WEITINGLIN32
It seems this rule is broken: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection
General rule - your training dataset should include such a set of relative sizes of objects that you want to detect:
train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width
In your case you should change network resolution fater training.
What is the average size of objects
Then calculate new width= in cfg-file:
detection_network_width
=
train_network_width * train_obj_width / train_image_width / (detection_obj_width / detection_image_width)
=
416 * average_train_obj_width / 4192 / (average_detection_obj_width / 8192)
= ???
#
@WEITINGLIN32
It seems this rule is broken: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection
General rule - your training dataset should include such a set of relative sizes of objects that you want to detect:
train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width
In your case you should change network resolution fater training.
What is the average size of objects
- in Training dataset?
- in Detection dataset?
Then calculate new width= in cfg-file:
detection_network_width
=
train_network_width * train_obj_width / train_image_width / (detection_obj_width / detection_image_width)
=
416 * average_train_obj_width / 4192 / (average_detection_obj_width / 8192)
= ???
Hello, @AlexeyAB, Now I have to edit is cfg-file width and height? And I want to ask how can I calculate the average_train_obj_width and average_detection_obj_width? Meanwhile, whether I have to retrain my model or not. If not, how should I do to detect panorama images. Thanks a lot!
@WEITINGLIN32
Now I have to edit is cfg-file width and height?
Yes.
And I want to ask how can I calculate the average_train_obj_width and average_detection_obj_width?
The simplest way to get average width of object is to calculate 1 anchor that will not be used in cfg-file:
./darknet detector calc_anchors data/obj.data -num_of_clusters 1 -width 416 -height 416
I.e. calculate 1 anchor
average_train_obj_width
and average_train_obj_height
) average_detection_obj_width
and average_detection_obj_height
)Please explain the parameters in classifier cfg file: @anandkoirala @AlexeyAB [softmax] groups=1 ---- ? temperature=3 ------?
Hi @AlexeyAB Could you please explain how cnn detects the bounding box coordinates, objectness score and probability of the object in yolo.
Hi @AlexeyAB,
I know you already explained how some of the YOLO layer parameters work, but there are some of them that I'm missing. Can you please explain the following? mask anchors num ignore_thresh truth_thresh random
I'll really appreciate the help! Cheers.
@AlexeyAB thanks a lot! Cheers!
Hey @AlexeyAB does route layer work for concating three layers, something like this: [route] layers=-1,-4,-3. Can this be performed?? If not how to concate three different layers??
Thanks, Madan
@Madankumar90 [route] - concatenation layer, Concat for several input-layers, or Identity for one input-layer
More: https://github.com/AlexeyAB/darknet/wiki/CFG-Parameters-in-the-different-layers
yeah i saw this late. Anyways, Thank you for the quick reply.
I understand the range for hue is 0.0 to 1.0. But what about saturation and exposure? I assume that setting both of these to 0.0 turns it off, but what is the upper range?
@stephanecharette default values
hue=0
exposure=1
saturation=1
hue=0.3 - means hue from -0.3 to +0.3
exposure=1.5 - means exposure from 1/1.5
to 1*1.15
saturation=1.5 - means saturation from 1/1.5
to 1*1.15
how it will be calculated: https://github.com/AlexeyAB/darknet/blob/2116cba1ed123b38b432d7d8b9a2c235372fd553/src/data.c#L1017-L1019
how it will be applied: https://github.com/AlexeyAB/darknet/blob/2116cba1ed123b38b432d7d8b9a2c235372fd553/src/image_opencv.cpp#L1183-L1198
Thanks, @AlexeyAB. I don't think I explained my question very well. I'm trying to figure out what is the maximum range someone can set for these three values in the .cfg file.
Hue from 0 to 1.0 Saturation from 0.003 to 256 Exposure from 0.003 to 256
Hue from 0 to 1.0 Saturation from 0.003 to 256 Exposure from 0.003 to 256
I found 256 is extreme and probably unusable for most images. If anyone else in the future is reading this thread, you can see the results of modifying hue, saturation, and exposure here, with examples of what images look like when applying different values for hue, saturation, and exposure: https://www.ccoderun.ca/DarkMark/DataAugmentationColour.html
Hey @AlexeyAB I'm sorry but can you explain the output parameters? Namely iou_norm, class_norm, cls_norm, scale_x_y from, iou_loss , mse?
[yolo] params: iou loss: mse (2), iou_norm: 0.75, cls_norm: 1.00, scale_x_y: 1.00
And BF from 96 conv 128 1 x 1/ 1 26 x 26 x 256 -> 26 x 26 x 128 0.044 BF
@stephanecharette default values
hue=0 exposure=1 saturation=1
hue=0.3 - means hue from -0.3 to +0.3 exposure=1.5 - means exposure from
1/1.5
to1*1.15
saturation=1.5 - means saturation from1/1.5
to1*1.15
how it will be calculated:
how it will be applied:
is this should be? hue=0 exposure=1 saturation=1 hue=0.3 - means hue from -0.3 to +0.3 exposure=1.5 - means exposure from 1/1.5 to 1 1.5 (not 1.15) saturation=1.5 - means saturation from 1/1.5 to 1 1.5 (not 1.15)
Hi @AlexeyAB
could you please kindly document or explain parameters of the .cfg file