Pelee, achieves 76.4% mAP (mean average precision) on PASCAL VOC2007 and 22.4 mAP on MS COCO dataset at the speed of 23.6 FPS on iPhone 8 and 125 FPS on NVIDIA TX2.
The result on COCO outperforms YOLOv2 in consideration of a higher precision, 13.6 times lower computational cost and 11.3 times smaller model size.
We propose a variant of DenseNet Huang et al. (2016a) architecture called PeleeNet for mobile devices.
Two-Way Dense Layer
GoogLeNet에서 영감을 얻음.
to get different scales of receptive fields.
One way : 3x3 kernel
Two way : 큰 object에 대한 비주얼 패턴을 학습하기 위해
Stem Block
Inception-v4 & DSOD 에서 영감을 얻음.
첫번째 layer에서, 즉, dense layer전에
"we design a cost efficient stem block before the first dense layer."
DenseNet에서는 초기 몇개의 layer에서 input channel보다 더 큰 channel이 존재함. 이는 계산 cost 증가시키는 요인이 됨.
그래서, input shape에 따라 이를 초과하기 않도록 구조 설계.
이는, DenseNet 보다 28.5% 정도의 계산 cost를 줄임.
Transition Layer without Compression
DenseNet에 의해 제한된 압축요인은 실제로는, 우리의 실험에서 feature expression을 약화시키는 것을 발견.
그래서, transition layer안에 input channel과 동일하게 설계.
Composite Function - 더 찾아봐야겠다.. 정확히 몬뜻인지.ㅠ
speed를 향상시키기 위해, post-activation 사용 (Convolution - Batch Normalization Ioffe & Szegedy (2015) - Relu) as our composite function instead of pre-activation used in DenseNet.
For post-activation, all batch normalization layers can be merged with convolution layer at the inference stage, which can accelerate the speed greatly. To compensate for the negative impact on accuracy caused by this change, we use a shallow and wide network structure. We also add a 1x1 convolution layer after the last dense block to get the stronger representational abilities.
We optimize the network architecture of Single Shot MultiBox Detector (SSD) Liu et al. (2016)
for speed acceleration and then combine it with PeleeNet.
SSD 와의 결합.
Feature Map Selection
selected set of 5 scale feature maps (19 x 19, 10 x 10, 5 x 5, 3 x 3, and 1 x 1).
38 x 38 feature map 사용안함.
Residual Prediction Block
각 scale에의 feature map을 input으로하여 Residual Prediction Block에 통과시킨후 이를 실제 predication에 이용.
Small Convolutional Kernel for Prediction
category와 box를 찾기 위해 가능한 1x1 kernel을 적용 > 실험에서, 3x3 kernels 과 같은 성능을 보임. 반면에 계산 비용은 21.5% 감소.
We provide a benchmark test for different efficient classification models and different one-stage
object detection methods on NVIDIA TX2 embedded platform and iPhone 8.
https://arxiv.org/abs/1804.06882