Closed duduheihei closed 4 years ago
Thank you for interest for this repo.
1) I am not the author of the article, I just implement it for my own purposes.
2) Yes, it can be easily applied to any architecture, containing the depth-wise convolutions(like those you mentioned):
MobileNetV1 example:
a) Let's take random implementation
b) Find declaration of depth-wise convolution layer and replace it on Shift2D(inp, init_stride=3, <other args if needed>)
and that's all
3) Unfortunately, I cannot provide models with results of successful applying of SSL(causes it belongs to company where I am working). However, creating those examples still in TODO list and when I will have time I add it
Anyway I will glad for any contribution to this repo, Thank you!
Got it. Thanks a lot! And I will do some experiments on MobileNetV2 and ShuffleNet and discuss the results here.
I have replaced 3x3 conv layers in mobileNetV2 with ShiftLayer, and the precision is obviously lower than original model. I try to set different params of Shift Layer such as init_stride and active_flag and get slightly better results. However there is still a significant gap between the "shifted mobileNetV2" and the original model. Could you provide some advices for this problem?
.weight
attribute
b) active_flag - stands for computation of forward pass via billinear interpolation(like it happens always in backward) and this article.
c) by default layer used zero padding, however it also may be not good solution, due to information loss. You can considering: 'border', 'reflect', 'symmetric' padding modes.
d) More important is sparsity_term! if it not equivalent to 0., than layer gives two outputs, where second is l1 regularization on weight. You can add it to general loss. IMPORTANT by default sparsity_term=5e-4 and hence local loss computation is occurs. Thanks for your advices. Here are details in my situation:
@
Thanks for your advices. Here are details in my situation:
- The backbone is MobileNetV2-0.5,I replace 3x3 convolution layers in the LinearBottleneck blocks. When the stride of 3x3 convolution is 2, I replace it with 2x2 average pooling. When the stride is 1, I replace 3x3 convolution layer with ShiftLayer with init_stride 3.
- I trained the model on non-public dataset, it has 3 classes in total. The training precision drops from 94% down to 88% on average. 3.I do not utilize sparse term, because I think it will reduce the capability of network .
I think you need to control the ShiftLayer init_stride value.
I got similar accuracy performance using this code with tensorflow open source code by this code for active shift implementation.
- Can you share me how the model look likes after your changes?
- Moreover, it is better to share the precision values.
- a) Init_stride is important and should be not less as kernel_size of replaced dw conv. It responsible for initializing shifts sizes for each channel uniformly from [-init_stride, init_stride]. I also think, that sometimes such initialization is not good solution at all, but you can implement any initialization for Shifts weights, because it accessible directly
.weight
attribute b) active_flag - stands for computation of forward pass via billinear interpolation(like it happens always in backward) and this article. c) by default layer used zero padding, however it also may be not good solution, due to information loss. You can considering: 'border', 'reflect', 'symmetric' padding modes. d) More important is sparsity_term! if it not equivalent to 0., than layer gives two outputs, where second is l1 regularization on weight. You can add it to general loss. IMPORTANT by default sparsity_term=5e-4 and hence local loss computation is occurs.
Thank you for your answer. I was wonder what the the return second value is.
@
Thanks for your advices. Here are details in my situation:
- The backbone is MobileNetV2-0.5,I replace 3x3 convolution layers in the LinearBottleneck blocks. When the stride of 3x3 convolution is 2, I replace it with 2x2 average pooling. When the stride is 1, I replace 3x3 convolution layer with ShiftLayer with init_stride 3.
- I trained the model on non-public dataset, it has 3 classes in total. The training precision drops from 94% down to 88% on average. 3.I do not utilize sparse term, because I think it will reduce the capability of network .
I think you need to control the ShiftLayer init_stride value.
I got similar accuracy performance using this code with tensorflow open source code by this code for active shift implementation.
Could you tell me which backbone and dataset you use?
Here is the definition of basic LinearBottleneck with ShiftLayer and I do not change other module of MobileNet: I replace 3x3 convolution layers in the LinearBottleneck blocks. When the stride of 3x3 convolution is 2, I replace it with 2x2 average pooling. When the stride is 1, I replace 3x3 convolution layer with ShiftLayer with init_stride 3.
class LinearBottleneck(nn.Module):
def __init__(self, inplanes, outplanes, stride=1, t=6, activation=nn.ReLU6):
super(LinearBottleneck, self).__init__()
self.conv1 = nn.Conv2d(inplanes, inplanes * t, kernel_size=1, bias=False)
self.bn1 = nn.BatchNorm2d(inplanes * t)
if stride != 1:
self.ave_pool = nn.AvgPool2d(kernel_size=2,stride=stride,padding=1)
else:
self.shiftlayer = Shift2D(in_channels=inplanes * t,init_stride=3,active_flag=True)
self.conv3 = nn.Conv2d(inplanes * t, outplanes, kernel_size=1, stride=stride, bias=False)
self.bn3 = nn.BatchNorm2d(outplanes)
self.activation = activation(inplace=True)
self.stride = stride
self.t = t
self.inplanes = inplanes
self.outplanes = outplanes
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.activation(out)
if self.stride!=1:
out = self.ave_pool(out)
else:
out,_ = self.shiftlayer(out)
out = self.conv3(out)
out = self.bn3(out)
if self.stride == 1 and self.inplanes == self.outplanes:
out += residual
return out
@duduheihei
So, I look on your code:
1) self.conv3 = nn.Conv2d(inplanes * t, outplanes, kernel_size=1, stride=stride, bias=False)
has stride in arguments, hence in case of stride>1 this convolution is also reduce tensor twice(by ignoring each second element of input tensor during convolution)
2) I do not understand, why you replaced shift on pooling in case of stride>2?
My vision is following:
self.bn1 = nn.BatchNorm2d(inplanes * t)
self.shiftlayer = Shift2D(in_channels=inplanes * t,init_stride=3,active_flag=True)
if stride != 1:
# MaxPool is also good variant here
self.pool = nn.AvgPool2d(kernel_size=2,stride=stride,padding=1)
self.conv3 = nn.Conv2d(inplanes * t, outplanes, kernel_size=1, stride=1, bias=False)
Or more simple version with stride in last conv:
self.bn1 = nn.BatchNorm2d(inplanes * t)
self.shiftlayer = Shift2D(in_channels=inplanes * t,init_stride=3,active_flag=True)
self.conv3 = nn.Conv2d(inplanes * t, outplanes, kernel_size=1, stride=stride, bias=False)
@
Thanks for your advices. Here are details in my situation:
- The backbone is MobileNetV2-0.5,I replace 3x3 convolution layers in the LinearBottleneck blocks. When the stride of 3x3 convolution is 2, I replace it with 2x2 average pooling. When the stride is 1, I replace 3x3 convolution layer with ShiftLayer with init_stride 3.
- I trained the model on non-public dataset, it has 3 classes in total. The training precision drops from 94% down to 88% on average. 3.I do not utilize sparse term, because I think it will reduce the capability of network .
I think you need to control the ShiftLayer init_stride value. I got similar accuracy performance using this code with tensorflow open source code by this code for active shift implementation.
Could you tell me which backbone and dataset you use?
I used the res-IB-SSL NN architecture which the paperhttps://arxiv.org/abs/1903.05285 proposed applying in cifar10 dataset. The resnet code is used in this site https://github.com/akamaster/pytorch_resnet_cifar10.
@duduheihei So, I look on your code:
self.conv3 = nn.Conv2d(inplanes * t, outplanes, kernel_size=1, stride=stride, bias=False)
has stride in arguments, hence in case of stride>1 this convolution is also reduce tensor twice(by ignoring each second element of input tensor during convolution)- I do not understand, why you replaced shift on pooling in case of stride>2? My vision is following:
self.bn1 = nn.BatchNorm2d(inplanes * t) self.shiftlayer = Shift2D(in_channels=inplanes * t,init_stride=3,active_flag=True) if stride != 1: # MaxPool is also good variant here self.pool = nn.AvgPool2d(kernel_size=2,stride=stride,padding=1) self.conv3 = nn.Conv2d(inplanes * t, outplanes, kernel_size=1, stride=1, bias=False)
Or more simple version with stride in last conv:
self.bn1 = nn.BatchNorm2d(inplanes * t) self.shiftlayer = Shift2D(in_channels=inplanes * t,init_stride=3,active_flag=True) self.conv3 = nn.Conv2d(inplanes * t, outplanes, kernel_size=1, stride=stride, bias=False)
I am sorry for my mistake that downsample operation twice in the block. The "more simple version" you provide is the same as I experiment first time, but the precision is obviously lower. Therefore I implement downsample operation by replacing 1x1 conv with stride 2 with 2x2 average pooling with stide 2. Unfortunaly, I forgot to adjust the param of stride in 1x1 conv, which make the block do downsample twice. Follow the vesion you provide, I have done experiments again, and got satisfactory precision. Here are two versions that works for me, for simlification, code sample do not contain batchnorm and activation function: Version 1:
self.conv1 = nn.Conv2d(inplanes, inplanes * t, kernel_size=1, bias=False)
if stride != 1:
self.ave_pool = nn.AvgPool2d(kernel_size=2,stride=stride,padding=1)
else:
self.shiftlayer = Shift2D(in_channels=inplanes * t,init_stride=3,active_flag=True)
self.conv3 = nn.Conv2d(inplanes * t, outplanes, kernel_size=1, stride=1, bias=False)
version 2:
self.conv1 = nn.Conv2d(inplanes, inplanes * t, kernel_size=1, bias=False)
self.shiftlayer = Shift2D(in_channels=inplanes * t,init_stride=3,active_flag=True)
if stride != 1:
self.ave_pool = nn.AvgPool2d(kernel_size=2,stride=stride,padding=1)
self.conv3 = nn.Conv2d(inplanes * t, outplanes, kernel_size=1, stride=1, bias=False)
cifar10
Thanks for your reply. Now I get satisfactory result on MobileNetV2,and the sample code is shown in discussion above.
Thanks for sharing the implementation of SSL (Sparse Shift Layer). However, I have not found any model constructed from SSL. I want to ask a question: can SSL be easily applied to some classic models such as mobileNet and shuffleNet? Or can you provide some models constructed from SSL and show satisfactory result?