AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.76k stars 7.96k forks source link

Training loss is not decrease when arrive about 700 for training custom dataset. #7148

Open lurenlym opened 3 years ago

lurenlym commented 3 years ago

If something doesn’t work for you, then show 2 screenshots:

  1. screenshots of your issue The loss seems not normal when training the custom date by use yolov4-csp and yolov4x-mish, but it is normal when using yolov4 and yolov4-tiny

cfg-file [net]

Testing

batch=1

subdivisions=1

Training

batch=64 subdivisions=8 width=640 height=640 channels=3 momentum=0.949 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1

learning_rate=0.001 burn_in=1000 max_batches = 20500 policy=steps steps=16000,18000 scales=.1,.1

mosaic=1

letter_box=1

optimized_memory=1

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=mish

Downsample

[convolutional] batch_normalize=1 filters=80 size=3 stride=2 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=40 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=80 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

Downsample

[convolutional] batch_normalize=1 filters=160 size=3 stride=2 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=80 size=1 stride=1 pad=1 activation=mish

[route] layers = -2

[convolutional] batch_normalize=1 filters=80 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=80 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=80 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=80 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=80 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=80 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=80 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=80 size=1 stride=1 pad=1 activation=mish

[route] layers = -1,-13

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish

Downsample

[convolutional] batch_normalize=1 filters=320 size=3 stride=2 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish

[route] layers = -2

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish

[route] layers = -1,-34

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

Downsample

[convolutional] batch_normalize=1 filters=640 size=3 stride=2 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[route] layers = -2

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[route] layers = -1,-34

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish

Downsample

[convolutional] batch_normalize=1 filters=1280 size=3 stride=2 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish

[route] layers = -2

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=640 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=640 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=640 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=640 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=640 size=3 stride=1 pad=1 activation=mish

[shortcut] from=-3 activation=linear

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish

[route] layers = -1,-19

[convolutional] batch_normalize=1 filters=1280 size=1 stride=1 pad=1 activation=mish

########################## 6 0 6 6 3

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish

[route] layers = -2

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=mish

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish

SPP

[maxpool] stride=1 size=5

[route] layers=-2

[maxpool] stride=1 size=9

[route] layers=-4

[maxpool] stride=1 size=13

[route] layers=-1,-3,-5,-6

End SPP

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=mish

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=mish

[route] layers = -1, -15

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[upsample] stride=2

[route] layers = 94

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[route] layers = -1, -3

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[route] layers = -2

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=mish

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=mish

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=mish

[route] layers = -1, -8

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish

[upsample] stride=2

[route] layers = 57

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish

[route] layers = -1, -3

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish

[route] layers = -2

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=160 activation=mish

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=160 activation=mish

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=160 activation=mish

[route] layers = -1, -8

[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish stopbackward=800

##########################

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=mish

[convolutional] size=1 stride=1 pad=1 filters=48 activation=logistic

[yolo] mask = 0,1,2 anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 classes=11 num=9 jitter=.1 scale_x_y = 2.0 objectness_smooth=0 ignore_thresh = .7 truth_thresh = 1

random=1

resize=1.5 iou_thresh=0.2 iou_normalizer=0.05 cls_normalizer=0.5 obj_normalizer=4.0 iou_loss=ciou nms_kind=diounms beta_nms=0.6 new_coords=1 max_delta=5

[route] layers = -4

[convolutional] batch_normalize=1 size=3 stride=2 pad=1 filters=320 activation=mish

[route] layers = -1, -22

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[route] layers = -2

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=mish

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=mish

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=mish

[route] layers = -1,-8

[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=mish

[convolutional] size=1 stride=1 pad=1 filters=48 activation=logistic

[yolo] mask = 3,4,5 anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 classes=11 num=9 jitter=.1 scale_x_y = 2.0 objectness_smooth=1 ignore_thresh = .7 truth_thresh = 1

random=1

resize=1.5 iou_thresh=0.2 iou_normalizer=0.05 cls_normalizer=0.5 obj_normalizer=1.0 iou_loss=ciou nms_kind=diounms beta_nms=0.6 new_coords=1 max_delta=5

[route] layers = -4

[convolutional] batch_normalize=1 size=3 stride=2 pad=1 filters=640 activation=mish

[route] layers = -1, -55

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish

[route] layers = -2

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=mish

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=mish

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=mish

[route] layers = -1,-8

[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1280 activation=mish

[convolutional] size=1 stride=1 pad=1 filters=48 activation=logistic

[yolo] mask = 6,7,8 anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 classes=11 num=9 jitter=.1 scale_x_y = 2.0 objectness_smooth=1 ignore_thresh = .7 truth_thresh = 1

random=1

resize=1.5 iou_thresh=0.2 iou_normalizer=0.05 cls_normalizer=0.5 obj_normalizer=0.4 iou_loss=ciou nms_kind=diounms beta_nms=0.6 new_coords=1 max_delta=2

  1. screenshots with such information

image

  1. training loss image

I found the iou_loss is negative, it seems abnormal, what's wrong? image

In the same dataset, It can be great when I used the yolov4 or yolov4-tiny network. What should I do?

niemiaszek commented 3 years ago

Do you have current verison of this repo? Try making clean and build from the newest version

lurenlym commented 3 years ago

Do you have current verison of this repo? Try making clean and build from the newest version

Yes,I have made clean and built it again after git pull operation,but it also not work.

wdxybhb commented 3 years ago

I got similar issue as yours, just the generic training and the same dataset, while getting totally different results between training yolov4 and yolov4-csp. On yolov4-csp, the iou_loss is negative, and the avg_loss fluctuates around 120-130 in the first 6000 batches of training.

Linchunhui commented 3 years ago

I got similar issue as yours, just the generic training and the same dataset, while getting totally different results between training yolov4 and yolov4-csp. On yolov4-csp, the iou_loss is negative, and the avg_loss fluctuates around 120-130 in the first 6000 batches of training.

I have the same issue.

lurenlym commented 3 years ago

@Linchunhui @wdxybhb Does anyone solve the problem? or what should I consider it?

maa01 commented 3 years ago

I am also having the same issue, if anyone figured this out please update the issue. Thanks

JinCho23 commented 3 years ago

@wdxybhb @Linchunhui @lurenlym @maa01 In the first [yolo] layer, you might use "obj_normalizer=4.0" like the template model. https://github.com/AlexeyAB/darknet/blob/b8c9c9d457a47d27710082c6e16206fc50af21f3/cfg/yolov4-csp.cfg#L1046

Check the code below how the losses are computed: https://github.com/AlexeyAB/darknet/blob/b8c9c9d457a47d27710082c6e16206fc50af21f3/src/yolo_layer.c#L893-L896 the iou_loss is computed as "loss - classification_loss". But the only classification_loss is multiplied by "obj_normalizer". This makes the negative iou_loss.

maa01 commented 3 years ago

@JinCho23 Any idea what the purpose of obj_normalizer is and if we need to change/remove this? I could not find it in the documentation: https://github.com/AlexeyAB/darknet/wiki/CFG-Parameters-in-the-different-layers

JinCho23 commented 3 years ago

@JinCho23 Any idea what the purpose of obj_normalizer is and if we need to change/remove this? I could not find it in the documentation: https://github.com/AlexeyAB/darknet/wiki/CFG-Parameters-in-the-different-layers

I think the use of big obj_normalizer (e.g., 4.0) at a large-sized feature map such as the first yolo layer is to improve the classification accuracy of small objects. It penalizes more on the classification loss during training. As you can see in the code, the final cost for actual training does not include the negative iou_loss. The cost is sum of avg_iou_loss and classification_loss, where the avg_iou_loss is computed from another function.

akashAD98 commented 3 years ago

hey @JinCho23 im not getting any parameter like obj_normalizer in .cfg yolo file,[batchnorm] is same like obj_normalizer?? if not then what is the parameter i need to change in .cfg

JinCho23 commented 3 years ago

hey @JinCho23 im not getting any parameter like obj_normalizer in .cfg yolo file,[batchnorm] is same like obj_normalizer?? if not then what is the parameter i need to change in .cfg

Hi @akashAD98 , sorry for this late response. I've been kind busy ;) obj_normalizer is not like batchnorm. You can set obj_normalizer in the [yolo] layer, where the default value is 1.0. I don't know why this parameter is hidden in the wiki's layer description. https://github.com/AlexeyAB/darknet/blob/master/src/parser.c#L470 You'll see how it works in the yolo_layer.c file. It gives more or less weights to the objectness score. https://github.com/AlexeyAB/darknet/blob/master/src/yolo_layer.c#L456

akashAD98 commented 3 years ago

@JinCho23 is there any parameters in which we can minimize our loss?? there are different loss functions present like diou,ciou,iou,mse,giou. in yolov4 alexyab tried ciou & nms_kind-grredynms & diounms . i just want to understand whats difference between this two nmskind . & what scenario we should use it.