AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.65k stars 7.96k forks source link

Yolo Nano #4037

Open sctrueew opened 4 years ago

sctrueew commented 4 years ago

Hi everyone, The model size is ~4Mb and mAP of 69.1% on the VOC 2007 dataset.

https://arxiv.org/pdf/1910.01271

AlexeyAB commented 4 years ago

It seems it uses Depth-wise convolution, that is very slow on GPU.

I think https://github.com/AlexeyAB/darknet/blob/master/cfg/enet-coco.cfg (EfficientNetb0-Yolo- 45.5% mAP@0.5 MS COCO - 3.7 BFlops) can achieve higher mAP on Pascal VOC, if it would be trained for pascal voc. https://github.com/AlexeyAB/darknet#pre-trained-models

WongKinYiu commented 4 years ago

both of Pelee and tiny-DSOD work better than YOLO Nano.

And I have trained YOLOv3-tiny for VOC 2007, it can get ~66% mAP. image

lunasdejavu commented 4 years ago

can anyone provide a pretrained model of yolo nano? it is not hard to write a yolo-nano.cfg but getting a good pre-trained model is not easy for everyone.

sctrueew commented 4 years ago

Hi @WongKinYiu ,

Can I use tiny-DSOD in this repo?

WongKinYiu commented 4 years ago

@zpmmehrdad Hello,

You can implement backbone of tiny-DSOD with YOLO head using this repo.

ghost commented 4 years ago

@WongKinYiu could you share Pelee and tiny-DSOD cfg

WongKinYiu commented 4 years ago

@gray2bgr Hello,

Both of Pelee and tiny-DSOD are based on SSD. I do not implement it using Darknet framework.

For Caffe framework implementations, see Pelee and tiny-DSOD.

colinlin1982 commented 4 years ago

both of Pelee and tiny-DSOD work better than YOLO Nano.

And I have trained YOLOv3-tiny for VOC 2007, it can get ~66% mAP. image

Hi, @WongKinYiu Could you please share your training tricks and weights files?

WongKinYiu commented 4 years ago

@colinlin1982 Hello,

I only change classes setting of https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov3-tiny.cfg and train 50k epochs.

[net]
# Testing
# batch=1
# subdivisions=1
# Training
batch=64
subdivisions=8
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=2000
max_batches = 50500
policy=steps
steps=40000,45000
scales=.1,.1

[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=2

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[maxpool]
size=2
stride=1

[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky

###########

[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky

[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=75
activation=linear

[yolo]
mask = 3,4,5
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes=20
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1

[route]
layers = -4

[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky

[upsample]
stride=2

[route]
layers = -1, 8

[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky

[convolutional]
size=1
stride=1
pad=1
filters=75
activation=linear

[yolo]
mask = 1,2,3
anchors = 10,14,  23,27,  37,58,  81,82,  135,169,  344,319
classes=20
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
colinlin1982 commented 4 years ago

Hi, @WongKinYiu I trained yolov3-tiny with Pascal VOC(2007 trainval+2012 trainval) on 1066 and get best map@0.5=55.09 with Pascal VOC 2007 test. My cfg file: https://github.com/colinlin1982/SlimYOLOv3/blob/master/cfg/yolov3-tiny.cfg chart.png: https://github.com/colinlin1982/SlimYOLOv3/blob/master/cfg/chart_yolov3-tiny.png

Major diffierence: Mine vs yours: subdivisions=4 vs subdivisions=8 blur=1 vs no blur

burn_in=1000 vs burn_in=2000 max_batches = 129600 vs max_batches = 50500 steps=100000,112000 vs steps=40000,45000

anchors = 34,63, 87,110, 93,227, 245,164, 177,306, 344,336 (calculated by darknet detector calc_anchors ... -num_of_clusters=6) vs anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319

last yolo mask=0,1,2 vs mask=1,2,3

So, does the mask and anchors matter so much?

WongKinYiu commented 4 years ago

@colinlin1982 Hello,

What is your pre-trained model? Yes, the anchors and mask should be match to the grid.

you can change

subdivisions=4
blur=1
max_batches=129600
steps=100000,112000

of my cfg

I think it can achieve better results than my model.

colinlin1982 commented 4 years ago

Hi, @WongKinYiu I did not have pre-trained model. //I'm training your cfg file now, and expected to get result at tomorrow morning. I have trained your cfg file, and got best mAP@0.5=55.19% after 55500 batches. chart.png: https://github.com/colinlin1982/SlimYOLOv3/blob/master/cfg/chart_yolov3-tiny_5_anchors.png

So the conclusion is yolov3-tiny benefits +5% mAP from ImageNet pretrained backbone?

Talking about anchors, your cfg file actually use 5 anchors: 23,27, 37,58, 81,82, 135,169, 344,319 which are very different from those I calculate by: darknet.exe detector calc_anchors voc.data -num_of_clusters 5 -width 416 -height 416

num_of_clusters = 5, width = 416, height = 416 read labels from 16551 images loaded image: 16551 box: 40058 all loaded.

calculating k-means++ ...

iterations = 30

avg IoU = 61.59 %

Saving anchors to the file: anchors.txt anchors = 38, 64, 89,147, 145,285, 258,169, 330,341

If your anchors matches better, does it mean calculate anchor is wrong?