AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.75k stars 7.96k forks source link

21180: -nan, -nan avg loss, 0.003000 rate #5764

Open anan91 opened 4 years ago

anan91 commented 4 years ago

hi, AlexeyAB The training output is normal with yolov4 at first,Training for a period of time, the output is all 0 like that v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: -0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.000000, .5R: 0.000000, .75R: 0.000000, count: 15, class_loss = 15.000001, iou_loss = 0.000003, total_loss = 15.000004 . However,with the same data set, I trained with yolov3 and everything worked fine,I figured out why?

helen12138 commented 4 years ago

The same error,do you fix it?

CaptainWuDaoKou commented 4 years ago

You may check the txts in your labels folder.

helen12138 commented 4 years ago

@CaptainWuDaoKou My txt file is normal. One of my label *.txt file is below 3 0.45564516129032256 0.4640820980615735 0.31129032258064515 0.6043329532497149 And train.txt is :

/home/jn/darknet/build/darknet/x64/data/obj/331.png                                    
/home/jn/darknet/build/darknet/x64/data/obj/EC_669.png
CaptainWuDaoKou commented 4 years ago

@CaptainWuDaoKou My txt file is normal. One of my label *.txt file is below 3 0.45564516129032256 0.4640820980615735 0.31129032258064515 0.6043329532497149 And train.txt is :

/home/jn/darknet/build/darknet/x64/data/obj/331.png                                    
/home/jn/darknet/build/darknet/x64/data/obj/EC_669.png

In that case, your problem is different from mine

anan91 commented 4 years ago

The same error,do you fix it?

This problem is caused by multi-GPU training,like that I use three.when l use one GPU to train yolov4,that error will not arise.so far l don't know why?@AlexeyAB