为什么训练过程loss没有下降和不收敛的现象，而且loss很大，有没有人也是这样的

Tianxiaomo / pytorch-YOLOv4

PyTorch ,ONNX and TensorRT implementation of YOLOv4

Apache License 2.0

4.48k stars 1.49k forks source link

为什么训练过程loss没有下降和不收敛的现象，而且loss很大，有没有人也是这样的 #109

Open jingenyan opened 4 years ago

jingenyan commented 4 years ago

Message: 'Train step_240: loss : 35656.95703125,loss xy : 118.43473815917969,loss wh : 51.72434616088867,loss obj : 33237.66015625，loss cls : 2249.13525390625,loss l2 : 13105.2314453125,lr : 2.0735999999999997e-07'

DongChen06 commented 4 years ago

same problem, any updates?

Softwaring commented 4 years ago

一样的问题，请问你解决了吗？ @jingenyan

Nicole1130 commented 4 years ago

出现了同样的问题，希望得到解答 @Tianxiaomo

architect-road commented 4 years ago

same problem@Tianxiaomo

admin221 commented 4 years ago

same problem

wuyong139 commented 4 years ago

same problem

ZZHHogan commented 4 years ago

same problem, any updates?

jingtianyilong commented 4 years ago

Same problem. Also notice that the final prediction results have extremly large box width. The center x,y and height are pretty much accurate. But you see generally large loss_xy but low loss_wh during the training. Opposite with the inference. I tried with yolov4.pth checkpoint, no problem at all. I've checked the visualization, dataset/dataloader and forwarding part. Seems fine. So I assume that the loss term might be the cause. I was confused by the code in yolo_loss. Hope someone can help. Great appreciate. @Tianxiaomo

jingtianyilong commented 4 years ago

https://github.com/AlexeyAB/darknet/blob/9db0ed96621bcc8bd4aba27b0e9662c6dc33f011/src/yolo_layer.c#L344-L675 This is the original loss term. Would definately check these code and see if there's anything we can do. It seems that total_loss = iou_loss + classification_loss. I also see label smoothness and some "normalizer"-things. This project uses MSELoss but base on my personal usage of the original darknet project, the default region loss should be a CIoU not GIoU or MSE. So I think the loss term in this project might have problem. Also, label smoothness, drop-block, multi-scale training are not yet available.

Fan-SR commented 3 years ago

同样的问题，请问有解决吗 Train step_320: loss : 64961.03125,loss xy : 19.951534271240234,loss wh : 22.7925968170166,loss obj : 64887.9765625，loss cls : 30.31162452697754,loss l2 : 23576.07421875,lr : 1.6e-11

0Error0Warning commented 3 years ago

I met this problem too, have you solved it?

Mxcarlet commented 3 years ago

感觉是obj_mask的问题

xlwan1132 commented 3 years ago

learning_rate is so small