Open May-forever opened 5 years ago
@hjm1990818
In the intermediate conv/shortcut/maxpool layers we don't require any range of output values, so we can use batch-normalization that moves values closer to 0, and default non-linear activation (leaky) that gives advantages to the positive values.
But [yolo] layer requires that there will not be any advantages for positives values - so we can't use leaky-activation, and for some parameters require values much more or less than 0 - so we can't use batch-normalization.
Activation
[yolo] layer already has different activations (logistic, exponential, ...) for different predictions: objectness, class-probability, x,y,w,h. So we shouldn't use any activation in the previous [convolutional] layer.
For example, in the [yolo] layer there is x_obj = i_cell + conv_output;
https://github.com/AlexeyAB/darknet/blob/21a4ec9390b61c0baa7ef72e72e59fa143daba4c/src/yolo_layer.c#L87
so if the previous conv-layer will use leaky activation conv_output = (input > 0) ? input : input/10;
then all negative values will be lower, and then shifting of the x-coordinate of object to the right will be 10x more than to the left, this will greatly interfere with the prediction of the coordinates of the object.
The same for the: y,w,h coordinates.
Batch normalization
We use batch normalization to stabilize training, speed up convergence, and regularize the model. It adds < 2% of mAP.
But Batch normalization may also interfere with the prediction of the coordinates, objectness and class-probability in the [yolo] layer, because some of these parameters require values far from 0.
Values before and after batch normalization:
@hjm1990818
In the intermediate conv/shortcut/maxpool layers we don't require any range of output values, so we can use batch-normalization that moves values closer to 0, and default non-linear activation (leaky) that gives advantages to the positive values.
But [yolo] layer requires that there will not be any advantages for positives values - so we can't use leaky-activation, and for some parameters require values much more or less than 0 - so we can't use batch-normalization.
Activation
[yolo] layer already has different activations (logistic, exponential, ...) for different predictions: objectness, class-probability, x,y,w,h. So we shouldn't use any activation in the previous [convolutional] layer.
For example, in the [yolo] layer there is
x_obj = i_cell + conv_output;
Line 87 in 21a4ec9
b.x = (i + x[index + 0*stride]) / lw;
so if the previous conv-layer will use leaky activation
conv_output = (input > 0) ? input : input/10;
then all negative values will be lower, and then shifting of the x-coordinate of object to the right will be 10x more than to the left, this will greatly interfere with the prediction of the coordinates of the object. The same for the: y,w,h coordinates. Batch normalizationWe use batch normalization to stabilize training, speed up convergence, and regularize the model. It adds < 2% of mAP.
But Batch normalization may also interfere with the prediction of the coordinates, objectness and class-probability in the [yolo] layer, because some of these parameters require values far from 0.
Values before and after batch normalization:
Hi,Alexey AB, @AlexeyAB
Thank you very much for your help and time.
Will darknet have any updates or add any new models recently ?
My sincere thanks to you for your time and patience.
Hi,Alexey AB, @AlexeyAB
Thanks for your kindness.
In YOLOV3.cfg. I found that the conv layer (i.e., the 105 layer) before the yolo layer (i.e., the 106 layer)
didn't add BN=1, and the activation=linear. Shown as following.
I want to know why you didn't add BN=1 in the 105 layer? and why you didn't use activation=leaky in
the 105 layer ?
I am looking forward to hearing from you, thank you very much for your help and time.
------------------The YOLOV3. cfg------------------------- [convolutional] #104 batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky
[convolutional] #105 size=1 stride=1 pad=1 filters=255 activation=linear
[yolo] #106 mask = 0,1,2 anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 classes=80 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1