WongKinYiu / yolov7

Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
GNU General Public License v3.0
13.32k stars 4.2k forks source link

NaN values for box and total losses during training #142

Closed BasharSu closed 2 years ago

BasharSu commented 2 years ago

I tried training a custom dataset on yolov7-w6 (after adjusting the number of classes). But I keep getting NaN values for the box and total loss values. The Validation P, R, and mAP values are rising with each epoch as expected, so I think it's a UI issue.

Input command: python train_aux.py --device 0 --batch-size 8 --data data/custom.yaml --img 1280 1280 --cfg yolov7-w6.yaml --weights '' --name w6-test --hyp data/hyp.scratch.p6.yaml --epochs 150

Output example: box obj cls total labels img_size nan 0.007422 0 nan 22 1280

ertugrul-dmr commented 2 years ago

I can second this, first tried with custom dataset had this issue. To confirm, I downloaded the coco dataset and trained with default settings and got same error.

WongKinYiu commented 2 years ago

It is a bug in CIoU loss calculation, will fix it.

wincle commented 2 years ago

Same bug I met, hope fixed soon.

CVer2022 commented 2 years ago

It is a bug in CIoU loss calculation, will fix it. Hi,has this problem been fixed?

xyz043066 commented 2 years ago

I have met the same bug: image

hope it will be fixed soon~ many thanks

wincle commented 2 years ago

It is a bug in CIoU loss calculation, will fix it.

Give me some hint about how to fix it, may be I could help on this cause I'm just working on it for few days.

Bruce-Si commented 2 years ago

any progress?

123wxr commented 2 years ago

有什么进展吗

JJLimmm commented 1 year ago

Facing the same issue but within pose model training.