chuanenlin / drone-net

https://towardsdatascience.com/tutorial-build-an-object-detection-system-using-yolo-9a930513643a
154 stars 66 forks source link

Drones not detected after training of Tiny YOLOv3 #6

Open damnko opened 5 years ago

damnko commented 5 years ago

Hi, I've followed your tutorial in order to train Tiny YOLOv3 with drone images. After ~1900 iterations I'm still not getting any detection even on training images, which is strange. I've tested with your weights and everything works, so there must be some problem during my training procedure.

Darknet has been compiled with GPU and the training has been done on Google Colab. Here is the output during the training run with the command ./darknet detector train drone.data cfg/yolov3-tiny-drone.cfg darknet53.conv.74:

layer     filters    size              input                output
    0 conv     16  3 x 3 / 1   416 x 416 x   3   ->   416 x 416 x  16  0.150 BFLOPs
    1 max          2 x 2 / 2   416 x 416 x  16   ->   208 x 208 x  16
    2 conv     32  3 x 3 / 1   208 x 208 x  16   ->   208 x 208 x  32  0.399 BFLOPs
    3 max          2 x 2 / 2   208 x 208 x  32   ->   104 x 104 x  32
    4 conv     64  3 x 3 / 1   104 x 104 x  32   ->   104 x 104 x  64  0.399 BFLOPs
    5 max          2 x 2 / 2   104 x 104 x  64   ->    52 x  52 x  64
    6 conv    128  3 x 3 / 1    52 x  52 x  64   ->    52 x  52 x 128  0.399 BFLOPs
    7 max          2 x 2 / 2    52 x  52 x 128   ->    26 x  26 x 128
    8 conv    256  3 x 3 / 1    26 x  26 x 128   ->    26 x  26 x 256  0.399 BFLOPs
    9 max          2 x 2 / 2    26 x  26 x 256   ->    13 x  13 x 256
   10 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512  0.399 BFLOPs
   11 max          2 x 2 / 1    13 x  13 x 512   ->    13 x  13 x 512
   12 conv   1024  3 x 3 / 1    13 x  13 x 512   ->    13 x  13 x1024  1.595 BFLOPs
   13 conv    256  1 x 1 / 1    13 x  13 x1024   ->    13 x  13 x 256  0.089 BFLOPs
   14 conv    512  3 x 3 / 1    13 x  13 x 256   ->    13 x  13 x 512  0.399 BFLOPs
   15 conv     18  1 x 1 / 1    13 x  13 x 512   ->    13 x  13 x  18  0.003 BFLOPs
   16 yolo
   17 route  13
   18 conv    128  1 x 1 / 1    13 x  13 x 256   ->    13 x  13 x 128  0.011 BFLOPs
   19 upsample            2x    13 x  13 x 128   ->    26 x  26 x 128
   20 route  19 8
   21 conv    256  3 x 3 / 1    26 x  26 x 384   ->    26 x  26 x 256  1.196 BFLOPs
   22 conv     18  1 x 1 / 1    26 x  26 x 256   ->    26 x  26 x  18  0.006 BFLOPs
   23 yolo
Loading weights from darknet53.conv.74...Done!

yolov3-tiny-drone
Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005
Resizing
416
Loaded: 0.000073 seconds
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.499833, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.498678, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.499832, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.498684, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.499832, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.498682, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.499833, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.498673, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.499832, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.498681, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.499833, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.498680, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.499832, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.498676, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.499833, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.498681, .5R: -nan, .75R: -nan,  count: 0
1: 315.492737, 315.492737 avg, 0.000000 rate, 1.219311 seconds, 24 images
Loaded: 0.000067 seconds
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.499833, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.498674, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.499833, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.498679, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.499832, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.498677, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.499833, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.498676, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.499833, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.498684, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.499832, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.498684, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.499833, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.498679, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.499832, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.498675, .5R: -nan, .75R: -nan,  count: 0
2: 315.492767, 315.492737 avg, 0.000000 rate, 0.481109 seconds, 48 images

................

Loaded: 0.000072 seconds
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.235015, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.036456, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.235015, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.036457, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.235015, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.036457, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.235015, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.036458, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.235015, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.036456, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.235015, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.036454, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.235015, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.036455, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.235015, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.036453, .5R: -nan, .75R: -nan,  count: 0
379: 26.248299, 33.163815 avg, 0.000021 rate, 0.740130 seconds, 9096 images
Loaded: 0.000068 seconds
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.232247, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.034958, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.232247, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.034960, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.232247, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.034961, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.232246, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.034962, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.232248, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.034957, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.232247, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.034961, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.232247, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.034960, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.232248, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.034960, .5R: -nan, .75R: -nan,  count: 0
380: 25.502501, 32.397682 avg, 0.000021 rate, 0.755673 seconds, 9120 images
Resizing
544

..................

Loaded: 0.000052 seconds
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000022, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000003, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000022, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000003, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000022, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000003, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000022, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000003, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000022, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000003, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000022, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000003, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000022, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000003, .5R: -nan, .75R: -nan,  count: 0
Region 16 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000022, .5R: -nan, .75R: -nan,  count: 0
Region 23 Avg IOU: -nan, Class: -nan, Obj: -nan, No Obj: 0.000003, .5R: -nan, .75R: -nan,  count: 0
1930: 0.000000, 0.002069 avg, 0.001000 rate, 0.583884 seconds, 46320 images

Do you notice anything strange that might be related to the problem I'm facing? Thank you so much for your help and for sharing the tutorial

Edit Here is a link to the avg loss plot of another run of training which gives the same problem: https://imgur.com/8IG3yO4

chuanenlin commented 5 years ago

@damnko I see that you always have count = 0. Perhaps check if all your training images are correctly labeled and configured (e.g. in same directory)? Also, feel free to start training from weights/yolo-drone.weights instead of from darknet53.conv.74.

damnko commented 5 years ago

Hi @chuanenlin thanks for your prompt reply. Yes, the images and labels are both in the same folder and I've downloaded everything from your repo, I didn't do the labeling myself, so I suppose they're correct.

That count should indicate the number of times the object is detected in a specific region for that image? What should I expect to see regarding that number so I can check on the full log? I will try to start training from your weights, but I suppose it should work anyway, starting from darknet53.conv.74 correct?

chuanenlin commented 5 years ago

@damnko Yes, therefore count = 0 should indicate either some kind of error (e.g. labeling) or that it needs more iterations of training. You should expect to see several lines of non-NAN values for IOU, Class, etc. and count > 0 after decent training iterations and correct configuration.

If you start from darknet53.conv.74, it would mean starting from "scratch" (take note that I trained for a couple of days on a NVIDIA Tesla GPU) while starting from weights/yolo-drone.weights would mean starting from where I left off.

damnko commented 5 years ago

Oh, a couple of days... But were you training the standard, non-tiny, version right? But then what does 1930: 0.000000 indicate? Shouldn't that be the loss you also refer to in the blog post?

chuanenlin commented 5 years ago

@damnko That's correct. FYI, feel free to open an issue here and look if anyone has also faced a similar issue.

damnko commented 5 years ago

Ok thanks, I will investigate a little bit more and then will open an issue on the darknet repo. Thanks for now, I will keep you posted and come back to close this issue.

damnko commented 5 years ago

Hi @chuanenlin , sorry to ask you again. But I've tried to follow another tutorial with a different dataset and it worked, now I'm trying the same settings, with your images/labels and I'm having the same problem.

The labels of the other tutorial (which works) is in the following form: 0 0.469603 0.48314199999999996 0.797766 0.795552 While your label has this format: 0 167.5 116.5 175.0 157.0

So, the first seems to have relative positions/dimensions, and yours absolute. Is there something wrong, or both should work? I wonder if the problem is there. Thanks again for your help

chuanenlin commented 5 years ago

@damnko May I ask which repository for YOLO are you using? Some variants such as AlexeyAB's version require different labeling formats, such as the one you mentioned. The labels in this repo should work with the original (pjreddie's) version.

damnko commented 5 years ago

This is the version I compiled: https://github.com/pjreddie/darknet

chuanenlin commented 5 years ago

That's odd - I trained the weights with the labels in this repo so the formatting should be fine. 🤔

damnko commented 5 years ago

I don't know, even when using labelimg for image labeling following YOLO format they look like this... 0 0.534304 0.380769 0.640333 0.584615 Feel free to close the issue since as for now, I think, my problem is solved using this labeling, even though I have not tried to convert your labels in this format. Thanks for your feedback and thanks again for sharing your work :pray:

trigaten commented 5 years ago

@damnko What OS are you on? I am having a similar problem with normal YOLO. Also, what was the other tutorial?

damnko commented 5 years ago

Hi @trigaten , I was using Linux Mint. It was some time ago and can't remember the other tutorial I was looking at but I guess it was this one: https://www.learnopencv.com/training-yolov3-deep-learning-based-custom-object-detector/