Error while training tiny yolo using COCO

Flock1 commented 5 years ago

Hi,

I'm trying to train Tiny YOLOv3 using COCO dataset. I have done everything as required but when I run the command ./darknet detector train cfg/obj.data yolov3-tiny-obj.cfg yolov3-tiny.conv.15 to start the training, I get the following error:

yolov3-tiny-obj
layer     filters    size              input                output
    0 conv     16  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  16  0.150 BFLOPs
    1 max          2 x 2 / 2   416 x 416 x  16   ->   208 x 208 x  16
    2 conv     32  3 x 3 / 1   208 x 208 x  16   ->   208 x 208 x  32  0.399 BFLOPs
    3 max          2 x 2 / 2   208 x 208 x  32   ->   104 x 104 x  32
    4 conv     64  3 x 3 / 1   104 x 104 x  32   ->   104 x 104 x  64  0.399 BFLOPs
    5 max          2 x 2 / 2   104 x 104 x  64   ->    52 x  52 x  64
    6 conv    128  3 x 3 / 1    52 x  52 x  64   ->    52 x  52 x 128  0.399 BFLOPs
    7 max          2 x 2 / 2    52 x  52 x 128   ->    26 x  26 x 128
    8 conv    256  3 x 3 / 1    26 x  26 x 128   ->    26 x  26 x 256  0.399 BFLOPs
    9 max          2 x 2 / 2    26 x  26 x 256   ->    13 x  13 x 256
   10 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512  0.399 BFLOPs
   11 max          2 x 2 / 1    13 x  13 x 512   ->    13 x  13 x 512
   12 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BFLOPs
   13 conv    256  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 256  0.089 BFLOPs
   14 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512  0.399 BFLOPs
   15 conv     33  1 x 1 / 1    13 x  13 x 512   ->    13 x  13 x  33  0.006 BFLOPs
   16 detection
   17 route  13
   18 conv    128  1 x 1 / 1    13 x  13 x 256   ->    13 x  13 x 128  0.011 BFLOPs
   19 upsample            2x    13 x  13 x 128   ->    26 x  26 x 128
   20 route  19 8
   21 conv    256  3 x 3 / 1    26 x  26 x 384   ->    26 x  26 x 256  1.196 BFLOPs
   22 conv     33  1 x 1 / 1    26 x  26 x 256   ->    26 x  26 x  33  0.011 BFLOPs
   23 detection
Loading weights from yolov3-tiny.conv.15...Done!
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Resizing
352
Couldn't open file: /media/user/Datasets/YOLO/train2017/000000301712.txt
Segmentation fault (core dumped)

I really don't know where it is getting this txt file. I've checked the train.txt and test.txt files and there's no mention of the above file. Moreover, everytime I run the command, the file is different.

AlexeyAB commented 5 years ago

@Flock1 Hi,

You should have txt-label-file for each image with the same name but with txt-extension, with coordinates of objects. In such format <object-class> <x_center> <y_center> <width> <height>

As described here: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

Flock1 commented 5 years ago

@AlexeyAB, I did. I used the following link to convert COCO dataset annotation to YOLO annotations

https://bitbucket.org/yymoto/coco-to-yolo/src/master/

AlexeyAB commented 5 years ago

So just copy these txt files to the same directory where are images.

And check

your codo.data file (is there correct path to train.txt`)
and train.txt file (is there file 000000301712.jpg)

Flock1 commented 5 years ago

@AlexeyAB, I'll check that. I've copied the train.txt and test.txt in the data folder. That I'm sure of. Also, the image exists. Like I mentioned in the 1st post, every time I run the command, I get a new file along with a new number for resizing.

The only thing different is that I don't have the txt files in the same folder as images. I'll do that.

AlexeyAB commented 5 years ago

The only thing different is that I don't have the txt files in the same folder as images. I'll do that.

Yes, do it.

, I get a new file along with a new number for resizing.

Do you use the latest version of Darknet?

Flock1 commented 5 years ago

@AlexeyAB, I don't think so I have the latest version. Is there any simpler way of updating it or will I have to clone darknet again?

Flock1 commented 5 years ago

@AlexeyAB, I also want to know where are the weights getting saved? For the first 1000 iterations, it saved the weights as yolo-obj_xxxx.weights but then, after that, for every 1000 iterations, it's saving as Saving weights to backup//yolov3-tiny-obj.backup.

Moreover, the loss is also not changing much 4357: 4.114630, 4.357525 avg, 0.001000 rate, 0.483670 seconds, 139424 images Can you suggest what's happening?

Flock1 commented 5 years ago

@AlexeyAB, I was working with pjreddie darknet. I used your darknet repo and even there the loss isn't going less than 4.5

What do you suggest?

AlexeyAB commented 5 years ago

@Flock1

Just check that mAP increases and continue training: https://github.com/AlexeyAB/darknet#when-should-i-stop-training

Flock1 commented 5 years ago

@AlexeyAB, this is what I got when I ran ./darknet detector map build/darknet/x64/data/obj.data yolov3-tiny-obj.cfg build/darknet/x64/backup/yolov3-tiny-obj_5000.weights

detections_count = 297704, unique_truth_count = 14323  
class_id = 0, name = person, 704 ap = 26.03 % 
class_id = 1, name = bicycle,    ap = 10.16 % 
class_id = 2, name = car,    ap = 11.36 % 
class_id = 3, name = motorcycle,     ap = 16.33 % 
class_id = 4, name = bus,    ap = 26.10 % 
class_id = 5, name = truck,      ap = 13.13 % 
 for thresh = 0.25, precision = 0.44, recall = 0.17, F1-score = 0.25 
 for thresh = 0.25, TP = 2429, FP = 3075, FN = 11894, average IoU = 29.61 % 

 IoU threshold = 50 % 
 mean average precision (mAP@0.50) = 0.171831, or 17.18 % 
Total Detection Time: 69.000000 Seconds

Is this okay?

AlexeyAB / darknet

Error while training tiny yolo using COCO #2582