cvjena / cnn-models

ImageNet pre-trained models with batch normalization for the Caffe framework
https://arxiv.org/abs/1612.01452
BSD 2-Clause "Simplified" License
363 stars 165 forks source link

ResNet50: bottom blob of expand layer #18

Open qilinli opened 7 years ago

qilinli commented 7 years ago

Hi there,

Thanks for sharing the pre-trained models. I am learning the ResNets50 and have a question about the architecture. It sames that there are quite few places different with original ResNets.

  1. The data preprocess is changed from mean subtraction to batch normalization, which has been noted.

However I aware another main difference in the expanding convolution layer. For example the first one:

layer { name: "layer_64_1_conv_expand" type: "Convolution" bottom: "layer_64_1_conv1" top: "layer_64_1_conv_expand" .......

It shows that the bottom blob come from "layer_64_1_conv1", which was "conv1_pool" in the original architecture. Is this a modification? As shown by your results that you can consistently improve the accuracy compared to the original implementation, it this the reason?

MarcelSimon commented 7 years ago

Hi! There is a pooling in both my and Kaiming's implementation. I can't see what you mean. Could you please provide line numbers for both prototxts?

qilinli commented 7 years ago

@MarcelSimon sorry didn;t make it clear. I mean in the prototxt cnn-models/ResNet_preact/ResNet50_cvgj/train.prototxt line 295-318, which is the first expanding layer. Yours are expanded from "layer_64_1_conv1".

While in He's implementation deep-residual-networks/prototxt/ResNet-50-deploy.prototxt (cannot find train.prototxt) line 60-72 layer ''res2a_branch1'' (which corresponds to your expand layer, both use 1*1 convolution increase channel nums), the bottom layer is bottom: "pool1" which means he expands from the previous pooling layer.

And it's the same for all expanding layer. I think it is quite a big difference.

MarcelSimon commented 7 years ago

I see, thanks a lot for pointing that out! The difference occurs only at the first expand layer, the other ones are correct. The batch norm, scale and relu is shared, because it is the preactivation variant. However, the first expand should indeed use the conv1_pool as input. I will add a remark to the README soon

qilinli commented 7 years ago

As you mentioned the sharing batch norm, scale, it reminds me another difference between yours and He's implementation. If you check ther implementation http://ethereon.github.io/netscope/#/gist/db945b393d40bfa26006 ( graph) or https://github.com/KaimingHe/deep-residual-networks/blob/master/prototxt/ResNet-50-deploy.prototxt (prototxt) they actually use two batch norm+ scale for two branches, which means they do not share them. While you indeed did the batch norm + scale after branch merge, which is shared.

MarcelSimon commented 7 years ago

The implementation you are referring to is the original ResNet, not the preactivation variant. Please see https://github.com/facebook/fb.resnet.torch/blob/master/models/preresnet.lua and https://github.com/KaimingHe/resnet-1k-layers/blob/master/resnet-pre-act.lua#L63 for the preactivation variant

qilinli commented 7 years ago

I see. Thanks a lot @MarcelSimon