Open sctrueew opened 4 years ago
It seems it uses Depth-wise convolution, that is very slow on GPU.
I think https://github.com/AlexeyAB/darknet/blob/master/cfg/enet-coco.cfg (EfficientNetb0-Yolo- 45.5% mAP@0.5 MS COCO - 3.7 BFlops) can achieve higher mAP on Pascal VOC, if it would be trained for pascal voc. https://github.com/AlexeyAB/darknet#pre-trained-models
both of Pelee and tiny-DSOD work better than YOLO Nano.
And I have trained YOLOv3-tiny for VOC 2007, it can get ~66% mAP.
can anyone provide a pretrained model of yolo nano? it is not hard to write a yolo-nano.cfg but getting a good pre-trained model is not easy for everyone.
Hi @WongKinYiu ,
Can I use tiny-DSOD in this repo?
@zpmmehrdad Hello,
You can implement backbone of tiny-DSOD with YOLO head using this repo.
@WongKinYiu could you share Pelee and tiny-DSOD cfg
both of Pelee and tiny-DSOD work better than YOLO Nano.
And I have trained YOLOv3-tiny for VOC 2007, it can get ~66% mAP.
Hi, @WongKinYiu Could you please share your training tricks and weights files?
@colinlin1982 Hello,
I only change classes setting of https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov3-tiny.cfg and train 50k epochs.
[net]
# Testing
# batch=1
# subdivisions=1
# Training
batch=64
subdivisions=8
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1
learning_rate=0.001
burn_in=2000
max_batches = 50500
policy=steps
steps=40000,45000
scales=.1,.1
[convolutional]
batch_normalize=1
filters=16
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=32
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=64
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=128
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=2
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[maxpool]
size=2
stride=1
[convolutional]
batch_normalize=1
filters=1024
size=3
stride=1
pad=1
activation=leaky
###########
[convolutional]
batch_normalize=1
filters=256
size=1
stride=1
pad=1
activation=leaky
[convolutional]
batch_normalize=1
filters=512
size=3
stride=1
pad=1
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=75
activation=linear
[yolo]
mask = 3,4,5
anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
classes=20
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
[route]
layers = -4
[convolutional]
batch_normalize=1
filters=128
size=1
stride=1
pad=1
activation=leaky
[upsample]
stride=2
[route]
layers = -1, 8
[convolutional]
batch_normalize=1
filters=256
size=3
stride=1
pad=1
activation=leaky
[convolutional]
size=1
stride=1
pad=1
filters=75
activation=linear
[yolo]
mask = 1,2,3
anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
classes=20
num=6
jitter=.3
ignore_thresh = .7
truth_thresh = 1
random=1
Hi, @WongKinYiu I trained yolov3-tiny with Pascal VOC(2007 trainval+2012 trainval) on 1066 and get best map@0.5=55.09 with Pascal VOC 2007 test. My cfg file: https://github.com/colinlin1982/SlimYOLOv3/blob/master/cfg/yolov3-tiny.cfg chart.png: https://github.com/colinlin1982/SlimYOLOv3/blob/master/cfg/chart_yolov3-tiny.png
Major diffierence: Mine vs yours: subdivisions=4 vs subdivisions=8 blur=1 vs no blur
burn_in=1000 vs burn_in=2000 max_batches = 129600 vs max_batches = 50500 steps=100000,112000 vs steps=40000,45000
anchors = 34,63, 87,110, 93,227, 245,164, 177,306, 344,336 (calculated by darknet detector calc_anchors ... -num_of_clusters=6) vs anchors = 10,14, 23,27, 37,58, 81,82, 135,169, 344,319
last yolo mask=0,1,2 vs mask=1,2,3
So, does the mask and anchors matter so much?
@colinlin1982 Hello,
What is your pre-trained model? Yes, the anchors and mask should be match to the grid.
you can change
subdivisions=4
blur=1
max_batches=129600
steps=100000,112000
of my cfg
I think it can achieve better results than my model.
Hi, @WongKinYiu I did not have pre-trained model. //I'm training your cfg file now, and expected to get result at tomorrow morning. I have trained your cfg file, and got best mAP@0.5=55.19% after 55500 batches. chart.png: https://github.com/colinlin1982/SlimYOLOv3/blob/master/cfg/chart_yolov3-tiny_5_anchors.png
So the conclusion is yolov3-tiny benefits +5% mAP from ImageNet pretrained backbone?
Talking about anchors, your cfg file actually use 5 anchors: 23,27, 37,58, 81,82, 135,169, 344,319 which are very different from those I calculate by: darknet.exe detector calc_anchors voc.data -num_of_clusters 5 -width 416 -height 416
num_of_clusters = 5, width = 416, height = 416 read labels from 16551 images loaded image: 16551 box: 40058 all loaded.
calculating k-means++ ...
iterations = 30
avg IoU = 61.59 %
Saving anchors to the file: anchors.txt anchors = 38, 64, 89,147, 145,285, 258,169, 330,341
If your anchors matches better, does it mean calculate anchor is wrong?
Hi everyone, The model size is ~4Mb and mAP of 69.1% on the VOC 2007 dataset.
https://arxiv.org/pdf/1910.01271