Tianxiaomo / pytorch-YOLOv4

PyTorch ,ONNX and TensorRT implementation of YOLOv4
Apache License 2.0
4.47k stars 1.49k forks source link

high object loss during training #137

Open BELZHANG opened 4 years ago

BELZHANG commented 4 years ago

I trained for a single class object detection task, the obj loss stop decrease around 2000+ at after 30 epochs

2020-07-02 02:14:47,659 train.py[line:388] DEBUG: Train step_54720: loss : 2443.021484375,loss xy : 10.428301811218262,loss wh : 0.4252524971961975,loss obj : 2431.867431640625,loss cls : 0.3006296753883362,loss l2 : 149.89588928222656,lr : 0.0001 2020-07-02 02:16:22,901 train.py[line:388] DEBUG: Train step_55040: loss : 2481.62939453125,loss xy : 19.018123626708984,loss wh : 3.39251708984375,loss obj : 2458.537841796875,loss cls : 0.6808265447616577,loss l2 : 160.90220642089844,lr : 0.0001 2020-07-02 02:17:58,271 train.py[line:388] DEBUG: Train step_55360: loss : 2519.954345703125,loss xy : 15.710888862609863,loss wh : 8.115646362304688,loss obj : 2495.658447265625,loss cls : 0.4693739712238312,loss l2 : 175.01007080078125,lr : 0.0001 2020-07-02 02:19:33,424 train.py[line:388] DEBUG: Train step_55680: loss : 2428.70263671875,loss xy : 9.945980072021484,loss wh : 0.7127013206481934,loss obj : 2417.80322265625,loss cls : 0.24069327116012573,loss l2 : 149.42962646484375,lr : 0.0001 2020-07-02 02:21:08,341 train.py[line:388] DEBUG: Train step_56000: loss : 2504.226806640625,loss xy : 20.75212860107422,loss wh : 5.360098361968994,loss obj : 2477.334716796875,loss cls : 0.7798290252685547,loss l2 : 169.95872497558594,lr : 0.0001 2020-07-02 02:22:43,993 train.py[line:388] DEBUG: Train step_56320: loss : 2441.465087890625,loss xy : 24.657085418701172,loss wh : 2.177201509475708,loss obj : 2413.60498046875,loss cls : 1.0258153676986694,loss l2 : 155.5187225341797,lr : 0.0001 2020-07-02 02:23:31,693 train.py[line:399] INFO: Checkpoint 32 saved !

What could be the possible reasons?

Thanks

DongChen06 commented 4 years ago

I also face this problem, any solutions?

Tianxiaomo commented 4 years ago

How many pictures are there in your dataset.

DongChen06 commented 4 years ago

@Tianxiaomo I use 144 training images and 16 testing images. I set the parameters like

Cfg.batch = 16
Cfg.subdivisions = 8
Tianxiaomo commented 4 years ago

Iterate the model enough times to achieve good results, and you can review the validation of each epoch during training.

shuangzixing commented 4 years ago

high loss during my training, could you help me? Training size: 100000 Dataset classes: 1 image

Iterate the model enough times to achieve good results, and you can review the validation of each epoch during training.

Pigdrum commented 2 years ago

I have the same problem, have you solved?