AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.57k stars 7.95k forks source link

About Resizing Images, and width-height parameters #496

Open BaharTaskesen opened 6 years ago

BaharTaskesen commented 6 years ago

Hi @AlexeyAB,

First of all thank you so much for sharing this project and your patience for answering all questions! I am a senior student, and I am using tiny-yolo in raspberry-pi 2. The model should classify 3 objects in my case.

It works with 45s/frame currently, which is too slow for our robot. (wxh = 416x416) The size of our images that we gather from pi camera is 480x640. I am thinking to reduce size of the images to 120x160, and remove some convolution layers. (To make the model work faster) Should I change width and height inputs, in the .cfg file? and if so how should I change these parameters? Does model performs resizing on the images, that are not in shape of 416x416? Also, I am giving labels of the training set in a txt file, if it is performing resizing on training images, does it also arranges the .txt files (the coordinates of xmin ymin xmax ymax)?

I tried to change the width, and height parameters, but I got CUDA errors afterwards. Thank you!

AlexeyAB commented 6 years ago

@BaharTaskesen Hi,

  1. Any input images will be automatically resized to the neural network sized that specified in the cfg-file (by default 416x416). Also labels in the txt-files should be relative to the image size, to it should have values from 0.0 to 1.0 as described in the point 5: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects To label you images use this utility: https://github.com/AlexeyAB/Yolo_mark

  2. To speedup Yolo you should decrease neural network resolution - should decrease width=288 heigh=288 (or any value multiple of 32): https://github.com/AlexeyAB/darknet/blob/101de2b07aa2feefa74f7e73876fd5cc8fc696cf/cfg/tiny-yolo-voc.cfg#L4-L5

  3. Also to speedup you should base your cfg-file on tiny-yolo-voc.cfg instead of yolo-voc.2.0.cfg: https://github.com/AlexeyAB/darknet/blob/101de2b07aa2feefa74f7e73876fd5cc8fc696cf/cfg/tiny-yolo-voc.cfg

  4. If you use Linux on Raspberry pi 2, then to speedup set OPENMP=1 in the Makefile and then compile make -j8: https://github.com/AlexeyAB/darknet/blob/101de2b07aa2feefa74f7e73876fd5cc8fc696cf/Makefile#L5

  5. If you want to remove some layers to speedup, then the most labor-intensive layers are the two penultimate convolutional layers with the maximum number of filters

I tried to change the width, and height parameters, but I got CUDA errors afterwards.

Set width and height in the cfg file to only values that multiple of 32. Did you try to compile Darknet on PC with CUDA capable GPU?

Do you use Windows or Linux on the Raspberry pi 2?

BaharTaskesen commented 6 years ago

Thank you @AlexeyAB ! I was using labelImg to label images, but I will try YoloMark. I am also using tiny-yolo currently. I will try all the suggestions as soon as possible, and let you know. Yes, I am compiling Darknet on PC(Ubuntu) with CUDA 8.0 GTX 1070 (I choose CUDA 8.0, since I couldn't use keras with CUDA 9.1), and I am using Linux - Ubuntu on Raspberry pi 2.

BaharTaskesen commented 6 years ago

When I decrease width, and height parameters to say 160x160, I get current average loss as infinity (or it increases), why is that? Here is my .cfg file : [net] batch=64 subdivisions=8 width=192 height=192 channels=3 momentum=0.9 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1

learning_rate=0.001 max_batches = 120000 policy=steps steps=-1,100,80000,100000 scales=.1,10,.1,.1

[convolutional] batch_normalize=1 filters=16 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=64 size=3 stride=1 pad=1 activation=leaky

[maxpool] size=2 stride=2

[convolutional] batch_normalize=1 filters=32 size=3 stride=1 pad=1 activation=leaky

###########

[convolutional] batch_normalize=1 size=3 stride=1 pad=1 filters=32 activation=leaky

[convolutional] size=1 stride=1 pad=1 filters=40 activation=linear

[region] anchors = 0.738768,0.874946, 2.42204,2.65704, 4.30971,7.04493, 10.246,4.59428, 12.6868,11.8741 bias_match=1 classes=3 coords=4 num=5 softmax=1 jitter=.2 rescore=1

object_scale=5 noobject_scale=1 class_scale=1 coord_scale=1

absolute=1 thresh = .6 random=1

AlexeyAB commented 6 years ago

Try to change this line: https://github.com/AlexeyAB/darknet/blob/8f1f5cbf8321b6b313d8f455d596290e7b8bb3f7/src/data.c#L329 To this: if ((w < 0.01 || h < 0.01)) continue;

Probably because your object are too small, so objects have size less than 1 pixel.

TheExDeus commented 6 years ago

The original branch didn't resize to fill but instead kept aspect ratio by padding. This means a lot of the input image is unused (if all of the inputs are the same aspect ratio). So can width and height be set to a different (divisible by 32) size that isn't 1:1? I have tried that and it seems it works, but got much worse detection result (note I didn't train on non 1:1 though). As you have digged deeper, is there any reason why it cannot be something like 418x256? And does your branch does something differently, like scaling to fit input size without padding?

TheMikeyR commented 6 years ago

@TheExDeus Regarding scaling https://github.com/AlexeyAB/darknet/issues/232#issuecomment-336955485

TheExDeus commented 6 years ago

Thanks! Also explains why I get much different results with this and original branch with the same cfg and weights. It probably doesn't work well if I train with one scaling and then infere with another.

But can it also work with setting to non-1:1 network input size? Because it does of course increase performance which is required for me on an embedded platform, but I don't want to do that if I lose too much precision or if the network itself isn't stable because it doesn't support that. I haven't seen non-1:1 networks even though most inputs usually are not 1:1. Just trying to figure out why.

AlexeyAB commented 6 years ago

@TheExDeus How much mAP (mean average precision) can you get now? Just try to train using width=416 height=256 in your cfg-file, and check mAP. Then compare it with the current mAP.

darknet.exe detector map data/obj.data yolo-obj.cfg backup\yolo-obj_9000.weights


In my own cases I use square size of network that less than size of images. Also training and detection images has the same size. So in this case, there is no need to keep aspect ratios.

ANTISINK commented 6 years ago

@AlexeyAB The size of our images that we gather is 200x200. besides,the image is grayscale. Should I change width and height inputs, in the yolov3-voc.cfg file? thanks a lot

AlexeyAB commented 6 years ago

@ANTISINK Yes, you doesn't have images larger than 200x200 then you should use network size 224x224 or 192x192.