AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.68k stars 7.96k forks source link

Segmentation Fault:11 during the training using tiny-yolo #3583

Open mukundmurrali opened 5 years ago

mukundmurrali commented 5 years ago

I tried to do a custom object training using yolo-tiny and got Segmentation Fault:11 while training the data.

My dataset is pretty small and hence used a batch size of 2. I have two classes.

These are the logs before the segmentation fault occurred.

59: 265.406281, 291.885742 avg loss, 0.000000 rate, 2.416528 seconds, 118 images Loaded: 0.000035 seconds v3 (mse loss, Normalizer: (iou: 0.750000, cls: 1.000000) Region 16 Avg (IOU: 0.240935, GIOU: 0.240935), Class: 0.505708, Obj: 0.346130, No Obj: 0.463757, .5R: 0.000000, .75R: 0.000000, count: 2 v3 (mse loss, Normalizer: (iou: 0.750000, cls: 1.000000) Region 23 Avg (IOU: nan, GIOU: nan), Class: nan, Obj: nan, No Obj: 0.535592, .5R: nan, .75R: nan, count: 0

60: 267.282959, 289.425476 avg loss, 0.000000 rate, 2.404413 seconds, 120 images Resizing 416 x 416 Loaded: 0.014597 seconds v3 (mse loss, Normalizer: (iou: 0.750000, cls: 1.000000) Region 16 Avg (IOU: 0.220705, GIOU: 0.220705), Class: 0.497918, Obj: 0.540548, No Obj: 0.463309, .5R: 0.000000, .75R: 0.000000, count: 1 v3 (mse loss, Normalizer: (iou: 0.750000, cls: 1.000000) Region 23 Avg (IOU: 0.163935, GIOU: -0.286815), Class: 0.268903, Obj: 0.583067, No Obj: 0.533932, .5R: 0.000000, .75R: 0.000000, count: 1

61: 373.263947, 297.809326 avg loss, 0.000000 rate, 3.411958 seconds, 122 images Loaded: 0.000038 seconds v3 (mse loss, Normalizer: (iou: 0.750000, cls: 1.000000) Region 16 Avg (IOU: nan, GIOU: nan), Class: nan, Obj: nan, No Obj: nan, .5R: nan, .75R: nan, count: 0 v3 (mse loss, Normalizer: (iou: 0.750000, cls: 1.000000) Region 23 Avg (IOU: nan, GIOU: nan), Class: nan, Obj: nan, No Obj: nan, .5R: nan, .75R: nan, count: 0 Segmentation fault: 11

mukundmurrali commented 5 years ago

@AlexeyAB I am getting this always during my training. Please have a look at it.

AlexeyAB commented 5 years ago

It seems that you train on CPU. It will take months. And isn't test well. Try to train by using GPU.

Also you should use batch=64 subdivisions=64