The BN and the activation in the conv layer before yolo layer (YOLOV3)

May-forever commented 5 years ago

Hi，Alexey AB， @AlexeyAB

Thanks for your kindness.

In YOLOV3.cfg. I found that the conv layer (i.e., the 105 layer) before the yolo layer (i.e., the 106 layer)

didn't add BN=1, and the activation=linear. Shown as following.

I want to know why you didn't add BN=1 in the 105 layer? and why you didn't use activation=leaky in

the 105 layer ?

I am looking forward to hearing from you, thank you very much for your help and time.

------------------The YOLOV3. cfg------------------------- [convolutional] #104 batch_normalize=1 size=3 stride=1 pad=1 filters=256 activation=leaky

[convolutional] #105 size=1 stride=1 pad=1 filters=255 activation=linear

[yolo] #106 mask = 0,1,2 anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 classes=80 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1

AlexeyAB commented 5 years ago

@hjm1990818

In the intermediate conv/shortcut/maxpool layers we don't require any range of output values, so we can use batch-normalization that moves values closer to 0, and default non-linear activation (leaky) that gives advantages to the positive values.

But [yolo] layer requires that there will not be any advantages for positives values - so we can't use leaky-activation, and for some parameters require values much more or less than 0 - so we can't use batch-normalization.

Activation

[yolo] layer already has different activations (logistic, exponential, ...) for different predictions: objectness, class-probability, x,y,w,h. So we shouldn't use any activation in the previous [convolutional] layer.

For example, in the [yolo] layer there is x_obj = i_cell + conv_output; https://github.com/AlexeyAB/darknet/blob/21a4ec9390b61c0baa7ef72e72e59fa143daba4c/src/yolo_layer.c#L87 so if the previous conv-layer will use leaky activation conv_output = (input > 0) ? input : input/10; then all negative values will be lower, and then shifting of the x-coordinate of object to the right will be 10x more than to the left, this will greatly interfere with the prediction of the coordinates of the object. The same for the: y,w,h coordinates.

Batch normalization

We use batch normalization to stabilize training, speed up convergence, and regularize the model. It adds < 2% of mAP.

But Batch normalization may also interfere with the prediction of the coordinates, objectness and class-probability in the [yolo] layer, because some of these parameters require values far from 0.

Values before and after batch normalization: batchnorm

May-forever commented 5 years ago

@hjm1990818

In the intermediate conv/shortcut/maxpool layers we don't require any range of output values, so we can use batch-normalization that moves values closer to 0, and default non-linear activation (leaky) that gives advantages to the positive values.

But [yolo] layer requires that there will not be any advantages for positives values - so we can't use leaky-activation, and for some parameters require values much more or less than 0 - so we can't use batch-normalization.

Activation

[yolo] layer already has different activations (logistic, exponential, ...) for different predictions: objectness, class-probability, x,y,w,h. So we shouldn't use any activation in the previous [convolutional] layer.

For example, in the [yolo] layer there is x_obj = i_cell + conv_output;

darknet/src/yolo_layer.c

Line 87 in 21a4ec9

b.x = (i + x[index + 0*stride]) / lw;

so if the previous conv-layer will use leaky activation conv_output = (input > 0) ? input : input/10; then all negative values will be lower, and then shifting of the x-coordinate of object to the right will be 10x more than to the left, this will greatly interfere with the prediction of the coordinates of the object. The same for the: y,w,h coordinates. Batch normalization

We use batch normalization to stabilize training, speed up convergence, and regularize the model. It adds < 2% of mAP.

But Batch normalization may also interfere with the prediction of the coordinates, objectness and class-probability in the [yolo] layer, because some of these parameters require values far from 0.

Values before and after batch normalization:

Hi，Alexey AB， @AlexeyAB

Thank you very much for your help and time.

Will darknet have any updates or add any new models recently ?

My sincere thanks to you for your time and patience.

AlexeyAB / darknet

The BN and the activation in the conv layer before yolo layer (YOLOV3) #2011

[yolo] #106 mask = 0,1,2 anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 classes=80 num=9 jitter=.3 ignore_thresh = .7 truth_thresh = 1 random=1