Bad training - Githubissues

LionelLeee commented 4 years ago

Follow the author's steps to train your own data set (3 types) with yolov4. But the map is 0 after 10000 iterations. Loss has not changed. why? @AlexeyAB

AlexeyAB commented 4 years ago

Wrong dataset? Run training with -show_imgs flag, do you see correct bboxes?

LionelLeee commented 4 years ago

The dataset is composed of people, car, bus, truck, bicycle, motorcycle in the coco dataset.Run training with -show_imgs flag, I can get correct bboxes. @AlexeyAB

AlexeyAB commented 4 years ago

What command do you use for training?
Set valid=train.txt in obj.data file and run ./darknet detector map ... what mAP do you get?
show such screenshot
Attach your cfg file in zip

Run training with -show_imgs flag, I can get correct bboxes.

Show screenshot

LionelLeee commented 4 years ago

this is my screenshot. My cfg file is as follows： cpmmodel.zip

AlexeyAB commented 4 years ago

Show several examples of txt-label files.
Show content of files bad.list and bad_label.list

Run training with -show_imgs flag, I can get correct bboxes.

Show screenshot
Show training log with several avg loss and other lines

LionelLeee commented 4 years ago

txt-label files： Did not generate files bad.list and bad_label.list the follows is screenshot of training with -show_imgs flag the follows is training log with several avg loss,but after running for a period of time, it will change the taste class to 0, and the IOU is also 0 @AlexeyAB 猎豹浏览器截图20200518084926

AlexeyAB commented 4 years ago

Set valid=train.txt in obj.data file and run ./darknet detector map ... what mAP do you get?

LionelLeee commented 4 years ago

Is it similar to the picture above?It's running, it's slow

uzairrizwan commented 4 years ago

@AlexeyAB I am having the same issue. I have verified my dataset, .cfg file, .data file, .names file and all other required files, several times. But no success. The MaP is always 0, and the model doesn't give any predictions/detections.

aicukltd commented 4 years ago

I was having a very similar problem until I used the CFG file from the YOLO_MARK repository.

https://github.com/AlexeyAB/Yolo_mark/blob/master/x64/Release/yolo-obj.cfg

I followed the instructions in the YOLO_MARK repository (replacing classes with my count and filters with (classes + 5)*5 and I am getting mAP of 88% and AVG LOSS of 0.6~

LionelLeee commented 4 years ago

I used darknet.exe detector map for one day but it was extremely slow, it increased by about 4 every 5 minutes. @AlexeyAB

LionelLeee commented 4 years ago

Is it normal that the above picture appears after a period of operation? @AlexeyAB

aicukltd commented 4 years ago

@LionelLeee try my instructions and let me know if you get any training results?

My output looked the same as your's until I used the other .cfg

AlexeyAB commented 4 years ago

@LionelLeee

Is it similar to the picture above?It's running, it's slow

You should use /backup/cpmodel_last.weights instead of yolov4.conv.137 weights for mAP calculation. https://github.com/AlexeyAB/darknet#when-should-i-stop-training

LionelLeee commented 4 years ago

@AlexeyAB Thanks, do you have any solutions to my problem?

AlexeyAB commented 4 years ago

Yes, do everything according to the manual and do not make mistakes. So what is the mAP value?

aicukltd commented 4 years ago

On this note then @AlexeyAB can you explain why I got the exact same output as @LionelLeee using the yolo-v4.cfg from the latest commit (unchanged other than classes and filters)? Then when I switched to the CFG from the YOLO MARK repo everything is working perfectly?

LionelLeee commented 4 years ago

this is map value. @AlexeyAB

AlexeyAB commented 4 years ago

@aicukltd @LionelLeee Try to train from the begining by using 1 GPU with -map flag, do you get mAP higher than 0?

LionelLeee commented 4 years ago

I used 1 gpu to get the above map. @AlexeyAB

AlexeyAB commented 4 years ago

And show mAP that you get by usig this command ./darknet detector map F:/MSCOCO/coco_f.data yolov4-custom.cfg backup/yolov4-custom_last.weights -iou_thresh 0.01

LionelLeee commented 4 years ago

AlexeyAB commented 4 years ago

I just trained default MSCOCO-2014 dataset for 10 minutes: https://github.com/AlexeyAB/darknet/blob/master/scripts/get_coco_dataset.sh

By using this command darknet.exe detector train F:/MSCOCO/coco_f.data yolov4-custom.cfg yolov4.conv.137 -map

With width=416 height=416 batch=2 subdivisions=1 max_batches=2000 just for 2000 iterations for 10 minutes with final loss=~24.0

cfg: yolov4-custom.cfg.txt
pre-trained weights: https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.conv.137

With such content in the F:/MSCOCO/coco_f.data file

classes= 80
train  = F:/MSCOCO/trainvalno5k_f.txt
valid = F:/MSCOCO/5k_f.txt
#valid = F:/MSCOCO/val2017_f.list
#valid = F:/MSCOCO/testdev2017_f.txt
names = data/coco.names
backup = backup
eval=coco

And get non-zero mAP@0.01 = 0.17% by using this command: darknet.exe detector map F:/MSCOCO/coco_f.data yolov4-custom.cfg backup/yolov4-custom_last.weights -iou_thresh 0.01

I dont know how do you get mAP@0.01 = 0 after 10 000 iterations with batch=64.

LionelLeee commented 4 years ago

One very strange thing, I changed to the cfg file of yolov3-tiny, with the same data set, it is working properly, and can get the correct map. I used the original cfg, using different data sets, he can also work normally, can also get a normal map. But as long as it is the original cfg and the original data set, he cannot get a normal map.It will also become the following after running for a period of time, but this does not happen in the above two training processes. chart_cpmtest chart_kitti I am very confused, you can explain why this is？ @AlexeyAB

AlexeyAB commented 4 years ago

What is is the original cfg and the original data set?

LionelLeee commented 4 years ago

It is the cpmmodel.cfg that was sent to you above, it comes from the modification of yolov4-custom, the data set is to select 6 categories from the coco dataset（people, car, bus, truck, bicycle, motorcycle）

AlexeyAB / darknet

Bad training #5645