changtimwu / changtimwu.github.com

Tim's testing/practice notes
7 stars 2 forks source link

MobileNet and Depthwise Separable Convolution #70

Open changtimwu opened 7 years ago

changtimwu commented 7 years ago

this has been mentioned quite a lot recently.

solid analysis:

math:

changtimwu commented 7 years ago

zhihu's discussion is valuable https://www.zhihu.com/question/58941804

changtimwu commented 7 years ago

This answer describes some history of Separable Convolution(aka group convolution) https://stackoverflow.com/a/40927483/443016

some info about its advantage. https://stackoverflow.com/a/37092986/443016

changtimwu commented 7 years ago

an simplified resnet resnext also use this techniques a lot.

changtimwu commented 7 years ago

very good summary https://arxiv.org/pdf/1701.04489.pdf

changtimwu commented 7 years ago

another separable convolution heavy network quicknet the author is quite confident at its accuracy and speed. need comparison with mobilenet. studied! you can ignore quicknet. no comparison with mobilenet.

changtimwu commented 7 years ago

https://keras.io/layers/convolutional/#separableconv2d https://www.tensorflow.org/api_docs/python/tf/nn/depthwise_conv2d https://www.tensorflow.org/api_docs/python/tf/nn/separable_conv2d

changtimwu commented 7 years ago

better mobile net summary https://github.com/tensorflow/models/blob/master/slim/nets/mobilenet_v1.md

TF-Slim's README is a good read.
https://github.com/tensorflow/models/blob/master/slim/README.md

google's official implementation. It's based on slim. https://github.com/tensorflow/models/blob/master/slim/nets/mobilenet_v1.py

It's a bit different than the tensorflow APIs we used to. MobileNet can be defined like this. Array of layer objects!! as simple as keras.

_CONV_DEFS = [
    Conv(kernel=[3, 3], stride=2, depth=32),
    DepthSepConv(kernel=[3, 3], stride=1, depth=64),
    DepthSepConv(kernel=[3, 3], stride=2, depth=128),
    DepthSepConv(kernel=[3, 3], stride=1, depth=128),
    DepthSepConv(kernel=[3, 3], stride=2, depth=256),
    DepthSepConv(kernel=[3, 3], stride=1, depth=256),
    DepthSepConv(kernel=[3, 3], stride=2, depth=512),
    DepthSepConv(kernel=[3, 3], stride=1, depth=512),
    DepthSepConv(kernel=[3, 3], stride=1, depth=512),
    DepthSepConv(kernel=[3, 3], stride=1, depth=512),
    DepthSepConv(kernel=[3, 3], stride=1, depth=512),
    DepthSepConv(kernel=[3, 3], stride=1, depth=512),
    DepthSepConv(kernel=[3, 3], stride=2, depth=1024),
    DepthSepConv(kernel=[3, 3], stride=1, depth=1024)
]

need time to study the two apis. what's the difference than tf.nn. series?

 net = slim.conv2d(net, depth(conv_def.depth), conv_def.kernel,
                            stride=conv_def.stride,
                            normalizer_fn=slim.batch_norm,
                            scope=end_point)

net = slim.separable_conv2d(net, None, conv_def.kernel,
                             depth_multiplier=1,
                             stride=layer_stride,
                             rate=layer_rate,
                             normalizer_fn=slim.batch_norm,
                             scope=end_point)

docs are here. I called it contrib.slim to distinguish it from model.slim https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slim

changtimwu commented 7 years ago

mobilenet on iphone http://machinethink.net/blog/googles-mobile-net-architecture-on-iphone/?utm_source=tuicool&utm_medium=referral his work -- forge is also worthy to read.

the author wrote another article about tensorflow on ios. http://machinethink.net/blog/tensorflow-on-ios/

changtimwu commented 7 years ago

interesting! you can test inference speed without real training and actual data. https://github.com/harvitronix/keras-mobilenet/blob/master/examples/speed.py

in case tedious BatchNormalization+Activation https://github.com/fchollet/keras/blob/master/keras/applications/inception_v3.py#L40

changtimwu commented 7 years ago

Why it implements its own depthwiseconv instead of use keras's builtin SeparableConv2D?

rcmalli said Separable Convolution is already implemented in both Keras and TF but, there is no BN support after Depthwise layers (Still investigating).

we should compare

DeepDog

Excellent MobileNet tweak !!

https://medium.com/@timanglade/how-hbos-silicon-valley-built-not-hotdog-with-mobile-tensorflow-keras-react-native-ef03260747f3

changtimwu commented 7 years ago

computation cost take the first depthwise separable convolution in mobilenet as example(layer 2,3)

mobilenet_body_architecture

depthwise separable convolution =

computation cost of

pointwise convolution

it's a 1x1 standard convolution Dk Dk M N Df * Df = 1 1 M N Df * Df = 32 64 112 * 112

changtimwu commented 7 years ago

Dk=3, M=512, N=512, Df=14

hyper_cost

multi-adds

parameters

changtimwu commented 7 years ago

把 standard convolution 的計算細節搞懂, 仔細算出

才有能力去 evaluate or design network architecture squeezenet 被 reject 是很正常的, computation cost 根本沒降下來.

changtimwu commented 7 years ago

It worthes to read mxnet's implementation https://github.com/dmlc/mxnet/issues/5891

changtimwu commented 7 years ago

tiny yolo on android. 2fps on pixel. I think it's too slow. https://github.com/madeye/yolo-android

marshallixp commented 7 years ago

mark

ysh329 commented 7 years ago

mark

changtimwu commented 6 years ago

CondenseNet -- the authors claim that CondenseNet is much more efficient than mobilenet. According to its result table, it archives equivalent performance with only half cost of mobilenet in terms of number of parameters and operaions.

condensenet_result
roshan-gopalakrishnan commented 6 years ago

Can we get the weights in between the layers of depthwise convolution and pointwise convolution in a SeparableConv2D operation ? How can we get those weights in Keras?

changtimwu commented 6 years ago

A research team in google found a way to quantize mobilenet to 8-bit integer(1/4 parameter space) while losing only 2% accuracy(compared to FP32).
https://arxiv.org/abs/1712.05877

changtimwu commented 6 years ago

mobilenet V2

pretrained models

edmondja commented 6 years ago

Does anybody know when the activation function is being used in Keras's builtin separable_conv2d ? After depthwise or after pointwise convolution ? Or both maybe ?

@changtimwu Why not using BN between them ?

jayshonzs commented 5 years ago

Hi,

remove batch normalization between depthwise and pointwise

is there any reason to do this? is it useless to put a normalization between 2 conv?

thank you