Open lurenlym opened 3 years ago
Do you have current verison of this repo? Try making clean and build from the newest version
Do you have current verison of this repo? Try making clean and build from the newest version
Yes,I have made clean and built it again after git pull operation,but it also not work.
I got similar issue as yours, just the generic training and the same dataset, while getting totally different results between training yolov4 and yolov4-csp. On yolov4-csp, the iou_loss is negative, and the avg_loss fluctuates around 120-130 in the first 6000 batches of training.
I got similar issue as yours, just the generic training and the same dataset, while getting totally different results between training yolov4 and yolov4-csp. On yolov4-csp, the iou_loss is negative, and the avg_loss fluctuates around 120-130 in the first 6000 batches of training.
I have the same issue.
@Linchunhui @wdxybhb Does anyone solve the problem? or what should I consider it?
I am also having the same issue, if anyone figured this out please update the issue. Thanks
@wdxybhb @Linchunhui @lurenlym @maa01 In the first [yolo] layer, you might use "obj_normalizer=4.0" like the template model. https://github.com/AlexeyAB/darknet/blob/b8c9c9d457a47d27710082c6e16206fc50af21f3/cfg/yolov4-csp.cfg#L1046
Check the code below how the losses are computed: https://github.com/AlexeyAB/darknet/blob/b8c9c9d457a47d27710082c6e16206fc50af21f3/src/yolo_layer.c#L893-L896 the iou_loss is computed as "loss - classification_loss". But the only classification_loss is multiplied by "obj_normalizer". This makes the negative iou_loss.
@JinCho23 Any idea what the purpose of obj_normalizer is and if we need to change/remove this? I could not find it in the documentation: https://github.com/AlexeyAB/darknet/wiki/CFG-Parameters-in-the-different-layers
@JinCho23 Any idea what the purpose of obj_normalizer is and if we need to change/remove this? I could not find it in the documentation: https://github.com/AlexeyAB/darknet/wiki/CFG-Parameters-in-the-different-layers
I think the use of big obj_normalizer (e.g., 4.0) at a large-sized feature map such as the first yolo layer is to improve the classification accuracy of small objects. It penalizes more on the classification loss during training. As you can see in the code, the final cost for actual training does not include the negative iou_loss. The cost is sum of avg_iou_loss and classification_loss, where the avg_iou_loss is computed from another function.
hey @JinCho23 im not getting any parameter like obj_normalizer in .cfg yolo file,[batchnorm] is same like obj_normalizer?? if not then what is the parameter i need to change in .cfg
hey @JinCho23 im not getting any parameter like obj_normalizer in .cfg yolo file,[batchnorm] is same like obj_normalizer?? if not then what is the parameter i need to change in .cfg
Hi @akashAD98 , sorry for this late response. I've been kind busy ;) obj_normalizer is not like batchnorm. You can set obj_normalizer in the [yolo] layer, where the default value is 1.0. I don't know why this parameter is hidden in the wiki's layer description. https://github.com/AlexeyAB/darknet/blob/master/src/parser.c#L470 You'll see how it works in the yolo_layer.c file. It gives more or less weights to the objectness score. https://github.com/AlexeyAB/darknet/blob/master/src/yolo_layer.c#L456
@JinCho23 is there any parameters in which we can minimize our loss?? there are different loss functions present like diou,ciou,iou,mse,giou. in yolov4 alexyab tried ciou & nms_kind-grredynms & diounms . i just want to understand whats difference between this two nmskind . & what scenario we should use it.
If something doesn’t work for you, then show 2 screenshots:
cfg-file [net]
Testing
batch=1
subdivisions=1
Training
batch=64 subdivisions=8 width=640 height=640 channels=3 momentum=0.949 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1
learning_rate=0.001 burn_in=1000 max_batches = 20500 policy=steps steps=16000,18000 scales=.1,.1
mosaic=1
letter_box=1
optimized_memory=1
[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=mish
Downsample
[convolutional] batch_normalize=1 filters=80 size=3 stride=2 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=40 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=80 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
Downsample
[convolutional] batch_normalize=1 filters=160 size=3 stride=2 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=80 size=1 stride=1 pad=1 activation=mish
[route] layers = -2
[convolutional] batch_normalize=1 filters=80 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=80 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=80 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=80 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=80 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=80 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=80 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=80 size=1 stride=1 pad=1 activation=mish
[route] layers = -1,-13
[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish
Downsample
[convolutional] batch_normalize=1 filters=320 size=3 stride=2 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish
[route] layers = -2
[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=160 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish
[route] layers = -1,-34
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
Downsample
[convolutional] batch_normalize=1 filters=640 size=3 stride=2 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[route] layers = -2
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=320 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[route] layers = -1,-34
[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish
Downsample
[convolutional] batch_normalize=1 filters=1280 size=3 stride=2 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish
[route] layers = -2
[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=640 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=640 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=640 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=640 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=640 size=3 stride=1 pad=1 activation=mish
[shortcut] from=-3 activation=linear
[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish
[route] layers = -1,-19
[convolutional] batch_normalize=1 filters=1280 size=1 stride=1 pad=1 activation=mish
########################## 6 0 6 6 3
[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish
[route] layers = -2
[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=mish
[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish
SPP
[maxpool] stride=1 size=5
[route] layers=-2
[maxpool] stride=1 size=9
[route] layers=-4
[maxpool] stride=1 size=13
[route] layers=-1,-3,-5,-6
End SPP
[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=mish
[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=mish
[route] layers = -1, -15
[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[upsample] stride=2
[route] layers = 94
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[route] layers = -1, -3
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[route] layers = -2
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=mish
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=mish
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=mish
[route] layers = -1, -8
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish
[upsample] stride=2
[route] layers = 57
[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish
[route] layers = -1, -3
[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish
[route] layers = -2
[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=160 activation=mish
[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=160 activation=mish
[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=160 activation=mish
[route] layers = -1, -8
[convolutional] batch_normalize=1 filters=160 size=1 stride=1 pad=1 activation=mish stopbackward=800
##########################
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=mish
[convolutional] size=1 stride=1 pad=1 filters=48 activation=logistic
[yolo] mask = 0,1,2 anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 classes=11 num=9 jitter=.1 scale_x_y = 2.0 objectness_smooth=0 ignore_thresh = .7 truth_thresh = 1
random=1
resize=1.5 iou_thresh=0.2 iou_normalizer=0.05 cls_normalizer=0.5 obj_normalizer=4.0 iou_loss=ciou nms_kind=diounms beta_nms=0.6 new_coords=1 max_delta=5
[route] layers = -4
[convolutional] batch_normalize=1 size=3 stride=2 pad=1 filters=320 activation=mish
[route] layers = -1, -22
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[route] layers = -2
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=mish
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=mish
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=320 activation=mish
[route] layers = -1,-8
[convolutional] batch_normalize=1 filters=320 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=mish
[convolutional] size=1 stride=1 pad=1 filters=48 activation=logistic
[yolo] mask = 3,4,5 anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 classes=11 num=9 jitter=.1 scale_x_y = 2.0 objectness_smooth=1 ignore_thresh = .7 truth_thresh = 1
random=1
resize=1.5 iou_thresh=0.2 iou_normalizer=0.05 cls_normalizer=0.5 obj_normalizer=1.0 iou_loss=ciou nms_kind=diounms beta_nms=0.6 new_coords=1 max_delta=5
[route] layers = -4
[convolutional] batch_normalize=1 size=3 stride=2 pad=1 filters=640 activation=mish
[route] layers = -1, -55
[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish
[route] layers = -2
[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=mish
[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=mish
[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=640 activation=mish
[route] layers = -1,-8
[convolutional] batch_normalize=1 filters=640 size=1 stride=1 pad=1 activation=mish
[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=1280 activation=mish
[convolutional] size=1 stride=1 pad=1 filters=48 activation=logistic
[yolo] mask = 6,7,8 anchors = 12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401 classes=11 num=9 jitter=.1 scale_x_y = 2.0 objectness_smooth=1 ignore_thresh = .7 truth_thresh = 1
random=1
resize=1.5 iou_thresh=0.2 iou_normalizer=0.05 cls_normalizer=0.5 obj_normalizer=0.4 iou_loss=ciou nms_kind=diounms beta_nms=0.6 new_coords=1 max_delta=2
I found the iou_loss is negative, it seems abnormal, what's wrong?
In the same dataset, It can be great when I used the yolov4 or yolov4-tiny network. What should I do?