changtimwu commented 7 years ago

this has been mentioned quite a lot recently.

mobilenet mobilenet does a better job on describing depthwise separable convolution.
xception Please read the Prior Works section throughly. It describes how the depthwise seperable convolution been introduced.

solid analysis:

https://arxiv.org/abs/1608.04337v2

math:

http://www.songho.ca/dsp/convolution/convolution2d_separable.html

changtimwu commented 7 years ago

zhihu's discussion is valuable https://www.zhihu.com/question/58941804

changtimwu commented 7 years ago

This answer describes some history of Separable Convolution(aka group convolution) https://stackoverflow.com/a/40927483/443016

some info about its advantage. https://stackoverflow.com/a/37092986/443016

changtimwu commented 7 years ago

an simplified resnet resnext also use this techniques a lot.

changtimwu commented 7 years ago

very good summary https://arxiv.org/pdf/1701.04489.pdf

changtimwu commented 7 years ago

another separable convolution heavy network quicknet the author is quite confident at its accuracy and speed. need comparison with mobilenet. studied! you can ignore quicknet. no comparison with mobilenet.

changtimwu commented 7 years ago

https://keras.io/layers/convolutional/#separableconv2d https://www.tensorflow.org/api_docs/python/tf/nn/depthwise_conv2d https://www.tensorflow.org/api_docs/python/tf/nn/separable_conv2d

changtimwu commented 7 years ago

better mobile net summary https://github.com/tensorflow/models/blob/master/slim/nets/mobilenet_v1.md

TF-Slim's README is a good read.
https://github.com/tensorflow/models/blob/master/slim/README.md

google's official implementation. It's based on slim. https://github.com/tensorflow/models/blob/master/slim/nets/mobilenet_v1.py

It's a bit different than the tensorflow APIs we used to. MobileNet can be defined like this. Array of layer objects!! as simple as keras.

_CONV_DEFS = [
    Conv(kernel=[3, 3], stride=2, depth=32),
    DepthSepConv(kernel=[3, 3], stride=1, depth=64),
    DepthSepConv(kernel=[3, 3], stride=2, depth=128),
    DepthSepConv(kernel=[3, 3], stride=1, depth=128),
    DepthSepConv(kernel=[3, 3], stride=2, depth=256),
    DepthSepConv(kernel=[3, 3], stride=1, depth=256),
    DepthSepConv(kernel=[3, 3], stride=2, depth=512),
    DepthSepConv(kernel=[3, 3], stride=1, depth=512),
    DepthSepConv(kernel=[3, 3], stride=1, depth=512),
    DepthSepConv(kernel=[3, 3], stride=1, depth=512),
    DepthSepConv(kernel=[3, 3], stride=1, depth=512),
    DepthSepConv(kernel=[3, 3], stride=1, depth=512),
    DepthSepConv(kernel=[3, 3], stride=2, depth=1024),
    DepthSepConv(kernel=[3, 3], stride=1, depth=1024)
]

need time to study the two apis. what's the difference than tf.nn. series?

 net = slim.conv2d(net, depth(conv_def.depth), conv_def.kernel,
                            stride=conv_def.stride,
                            normalizer_fn=slim.batch_norm,
                            scope=end_point)

net = slim.separable_conv2d(net, None, conv_def.kernel,
                             depth_multiplier=1,
                             stride=layer_stride,
                             rate=layer_rate,
                             normalizer_fn=slim.batch_norm,
                             scope=end_point)

docs are here. I called it contrib.slim to distinguish it from model.slim https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slim

changtimwu commented 7 years ago

mobilenet on iphone http://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/?utm_source=tuicool&utm_medium=referral his work -- forge is also worthy to read.

the author wrote another article about tensorflow on ios. http://machinethink.net/blog/tensorflow-on-ios/

changtimwu commented 7 years ago

interesting! you can test inference speed without real training and actual data. https://github.com/harvitronix/keras-mobilenet/blob/master/examples/speed.py

in case tedious BatchNormalization+Activation https://github.com/fchollet/keras/blob/master/keras/applications/inception_v3.py#L40

changtimwu commented 7 years ago

Why it implements its own depthwiseconv instead of use keras's builtin SeparableConv2D?

rcmalli said Separable Convolution is already implemented in both Keras and TF but, there is no BN support after Depthwise layers (Still investigating).

we should compare

DeepDog

Excellent MobileNet tweak !!

remove batch normalization between depthwise and pointwise
use Keras's builtin separable_conv2d
use ELU instead of RELU
use CLR optimizer and find out the best parameters
efficient data augmentation with https://github.com/stratospark/keras-multiprocess-image-data-generator

https://medium.com/@timanglade/how-hbos-silicon-valley-built-not-hotdog-with-mobile-tensorflow-keras-react-native-ef03260747f3

changtimwu commented 7 years ago

computation cost take the first depthwise separable convolution in mobilenet as example(layer 2,3)

depthwise separable convolution =

a) 3x3 depthwise convolution for filtering
b) 1x1 pointwise convolution for combining
M: number of input layers = 32
N: number of output layers = 64
Dk Dk: kernel size= 3 3
Df Dk: input or output shape = 112 112

computation cost of

3x3 depthwise convolution: Dk Dk M Df Df = 3 3 112 * 112
1x1 pointwise convolution: M N Df * Df = 32 64 112 * 112

pointwise convolution

it's a 1x1 standard convolution Dk Dk M N Df * Df = 1 1 M N Df * Df = 32 64 112 * 112

changtimwu commented 7 years ago

Dk=3, M=512, N=512, Df=14

multi-adds

standard convolution = Df Df M N Dk Dk = 14 14 512 512 3 3 = 462,422,016
Depthwise Separable Convolution = (Dk Dk M Df Df) + (M N Df Df) = (3 3 512 14 14 ) + ( 512 512 14 14) = 52,283,392
Depthwise Separable Convolution(alpha=0.75) = (Dk Dk (M 0.75) Df Df) + ( (M 0.75) (N 0.75) Df Df) = 29,578,752

Depthwise Separable Convolution( alpha=0.75, rho=0.714)

(alpha,rho)=(0.75,0.714)
(Dk,M,N,Df)=(3, 512*alpha, 512*alpha, 14*rho)
(Dk * Dk * M * Df * Df) + (M * N * Df * Df) 
15079129.454591995

parameters

standard convolution: 33512*512 = 2359296
Depthwise Separable Convolution: 33512 + 1512512 = 266752

Depthwise Separable Convolution(alpha=0.75):

alpha=0.75
3*3*(512*alpha) + 1*(512*alpha)*(512*alpha)
150912

Depthwise Separable Convolution( alpha=0.75, rho=0.714): same of above. number of parameters has nothing to do with input/output dimension.

changtimwu commented 7 years ago

把 standard convolution 的計算細節搞懂, 仔細算出

weights, number of parameter
memory space
multi-adds

才有能力去 evaluate or design network architecture squeezenet 被 reject 是很正常的, computation cost 根本沒降下來.

changtimwu commented 7 years ago

It worthes to read mxnet's implementation https://github.com/dmlc/mxnet/issues/5891

changtimwu commented 7 years ago

tiny yolo on android. 2fps on pixel. I think it's too slow. https://github.com/madeye/yolo-android

marshallixp commented 7 years ago

mark

ysh329 commented 7 years ago

mark

changtimwu commented 6 years ago

CondenseNet -- the authors claim that CondenseNet is much more efficient than mobilenet. According to its result table, it archives equivalent performance with only half cost of mobilenet in terms of number of parameters and operaions.

roshan-gopalakrishnan commented 6 years ago

Can we get the weights in between the layers of depthwise convolution and pointwise convolution in a SeparableConv2D operation ? How can we get those weights in Keras?

changtimwu commented 6 years ago

A research team in google found a way to quantize mobilenet to 8-bit integer(1/4 parameter space) while losing only 2% accuracy(compared to FP32).
https://arxiv.org/abs/1712.05877

changtimwu commented 6 years ago

mobilenet V2

paper
nice discussion on zhihu https://www.zhihu.com/question/265709710/answer/298245276
Shicai's did a excellent job not only on V1/V2 implementation but also good survey of implementations. https://github.com/shicai/MobileNet-Caffe
Dave analyzed how V2 is evolved, where each elements of V2 is proposed from. Good writeup https://zhuanlan.zhihu.com/p/33999416

pretrained models

edmondja commented 6 years ago

Does anybody know when the activation function is being used in Keras's builtin separable_conv2d ? After depthwise or after pointwise convolution ? Or both maybe ?

@changtimwu Why not using BN between them ?

jayshonzs commented 5 years ago

Hi,

remove batch normalization between depthwise and pointwise

is there any reason to do this? is it useless to put a normalization between 2 conv?

thank you

changtimwu / changtimwu.github.com

MobileNet and Depthwise Separable Convolution #70

DeepDog

Excellent MobileNet tweak !!

pointwise convolution

multi-adds

parameters

mobilenet V2

pretrained models