Robert-JunWang / Pelee

Pelee: A Real-Time Object Detection System on Mobile Devices
Apache License 2.0
885 stars 255 forks source link

depthwise convolution to speed up #5

Open kaishijeng opened 6 years ago

kaishijeng commented 6 years ago

Does pelee net use depthwise comvolution, like mobilenet? If not, can it be modified to use it to speed up?

Thanks

Robert-JunWang commented 6 years ago

No. PeleeNet is built with conventional convolution. Depthwise convolution can reduce the number of multi-add. But for the real speed, it depends on the devices and the framework you used. For example, both MobileNet+SSD and Pelee are much faster than TinyYOLO on CPU and iPhone6s, but TinyYOLO is slightly faster than Pelee on GTX 1080 Ti GPU and iPhone8. Although the number of multi-add of TinyYOLO is about 3 times larger than Pelee.

Robert-JunWang commented 6 years ago

Most of the work in this paper was completed 8 months ago. At that time, except Tensorflow, other frameworks had bad support for Depthwise Separatable Convolution. However, the situation has changed a lot now. The performance of grouped convolution has improved greatly on cuDNNv7. Apple's CoreML also supports grouped convolution very well.

It is a good time to try the Depthwise Separatable Convolution version now. I am more interested in improving the accuracy by increasing the number of channels with Depthwise Separatable Convolution. Both MobileNet and PeleeNet can perform the image classification on iPhone6s, a phone released three years ago, in less than 50ms. The speed is good enough for many device-side applications.

kaishijeng commented 6 years ago

Pelee detection on my RK3399, (2 Core- arm A72 at 1.8Ghz) takes 400ms for a frame. So performance is not sufficient for my application (<100ms) Thanks,

Robert-JunWang commented 6 years ago

What framework do you use? If the framework doesn't support automatic merging BN layer with Conv layer, you need to it by yourself. It can save over 50% inference time. If you train the model from scratch, you can try a wide and shallow network. For example, for the last two dense block, using half of the number of the dense layer with a doubled growth rate.

On Tue, Apr 24, 2018 at 12:47 AM, kaishijeng notifications@github.com wrote:

Pelee detection on my RK3399, (2 Core- arm A72 at 1.8Ghz) takes 400ms for a frame. So performance is not sufficient for my application (<100ms) Thanks,

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Robert-JunWang/Pelee/issues/5#issuecomment-383802074, or mute the thread https://github.com/notifications/unsubscribe-auth/AG20rZM2aGtEhPsQabipc8cfouhfl2J8ks5trq5sgaJpZM4TfB_v .

kaishijeng commented 6 years ago

I use caffe framework and your pretrained model. How do I check BN/Conv layer is merged or not?

Thanks,

foralliance commented 6 years ago

@Robert-JunWang  HI

According to the model you provided, in trian/test.prototxt, the BN/Scale layer still exists alone. such as

layer {
  name: "stem1/bn"
  type: "BatchNorm"
  bottom: "stem1"
  top: "stem1"
  param {
    lr_mult: 0
    decay_mult: 0
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
  param {
    lr_mult: 0
    decay_mult: 0
  }
  batch_norm_param {
    moving_average_fraction: 0.999
    eps: 0.001
  }
}
layer {
  name: "stem1/scale"
  type: "Scale"
  bottom: "stem1"
  top: "stem1"
  scale_param {
    filler {
      value: 1
    }
    bias_term: true
    bias_filler {
      value: 0
    }
  }
}

does this mean that conv layer and BN layer are not merged? If so, why is the inference time so fast?

The so-called "automatic merging BN layer with Conv layer", does it mean that we need to modify the underlying C++ code?

Robert-JunWang commented 6 years ago

The models I provided are not merged. You can merge it by hand or other tools. I can add the merged model and script I used next week.

foralliance commented 6 years ago

hope for your update

ujsyehao commented 6 years ago

@Robert-JunWang Hi, I have two problems:

siyiding1216 commented 6 years ago

Are you comparing the latency for your model with mobileNet SSD (without depthwise convolution)? But mobileNet SSD could get 4 timers faster in GPU and 10 times faster in CPU with depthwise conv implementation. That means your model would be several times slower than mobileNet SSD with DW Conv...

Robert-JunWang commented 6 years ago

Do you mean MobileNetV1+SSDlite is 10 times faster than MobileNetV1+SSD on CPU? Do you mind to offer more detail information on how you evaluate the speed and how to get that result? In my understanding, the computation cost of the SSD algorithm is mostly consumed by the backbone network. The real speed difference between SSDlite and the original SSD should not be that big.

xonobo commented 6 years ago

As long as I understand the given merged models do not contain any batch normalization. Just convolution and ReLU, right?

lqs19881030 commented 6 years ago

mute

@Robert-JunWang what is the meaning of using half of the number of the dense layer with a doubled growth rate.can you show it and the map can fall much ?Thank you