WongKinYiu / ScaledYOLOv4

Scaled-YOLOv4: Scaling Cross Stage Partial Network
GNU General Public License v3.0
2.02k stars 572 forks source link

Resuming training yields 0 mAP 0 precision 0 recall #296

Open yusiyoh opened 3 years ago

yusiyoh commented 3 years ago

I trained yolov4-csp for 50 epoch. Then I want to continue training with the last weights of the first training. The commands I tried: python3 train.py --img 640 --batch 8 --epochs 59 --data 'data/dtld.yaml' --cfg ./models/yolov4-csp.yaml --weights 'runs/exp16_yolov4-csp-results-correctedlabels/weights/last_yolov4-csp-results-correctedlabels.pt' --name yolov4-csp-results-correctedlabels --cache --resume python3 train.py --resume 'runs/exp16_yolov4-csp-results-correctedlabels/weights/last_yolov4-csp-results-correctedlabels.pt' Both commands works, training continues from the epoch 50. However I got 0 AP 0 P and 0 R. Here is the results.txt with couple of lines: 47/49 7.43G 0.0558 0.02048 0.01051 0.08679 15 640 0.5523 0.5892 0.5674 0.3449 0.05718 0.03532 0.01002 48/49 7.43G 0.05574 0.02044 0.01051 0.08669 32 640 0.5544 0.5902 0.5686 0.3461 0.05712 0.0353 0.009984 49/49 7.43G 0.05559 0.02033 0.01041 0.08633 26 640 0.5558 0.5912 0.5695 0.3468 0.05706 0.03531 0.009947 50/58 5.58G 0.06746 0.0223 0.01299 0.1028 9 640 0 0 0 0 0.08809 0.0479 0.006979 I have only 50/58 here but I trained up to 54/58 and all of them were 0. What is the problem, can you help me?

yusiyohpolimi commented 3 years ago

What is the difference between last.pt and last_strip.pt?

WongKinYiu commented 3 years ago

_strip remove optimizer and reset some info. https://github.com/WongKinYiu/ScaledYOLOv4/blob/yolov4-large/utils/general.py#L836

yusiyoh commented 3 years ago

Then, to resume I have to use last.pt not last_strip.pt. But when I do so, I get 0 AP 0 P and 0 R as you can see.

yusiyohpolimi commented 3 years ago

image This is what I got when I run following command: python3 train.py --img 640 --batch 16 --epochs 100 --data 'data/dtld.yaml' --cfg ./models/yolov4-csp.yaml --weights './runs/exp20_yolov4-csp-results-correctedlabels/weights/best.pt' --name yolov4-csp-results-correctedlabels --cache As you can see there is a problem on mAP precision and recall after resuming training.

WongKinYiu commented 3 years ago

do you modify the test.py or do you change your validation data or labels? please check your validation files.

yusiyohpolimi commented 3 years ago

I did not touch validation data or labels. With same configuration, when I start training from beginning there is no issue.

WongKinYiu commented 3 years ago

what are the performance of exp20...best.pt and resume...last.pt when you use test.py for testing.

yusiyohpolimi commented 3 years ago

For test.py should i use --task validation ? Because I am using it for --task test. I will share the results

yusiyohpolimi commented 3 years ago

resume last.pt : image exp20 best: image