questions about BatchNorm usage

ducha-aiki / caffenet-benchmark

Evaluation of the CNN design choices performance on ImageNet-2012.

742 stars 155 forks source link

questions about BatchNorm usage #3

Closed abangdd closed 8 years ago

abangdd commented 8 years ago

hi ducha-aiki， I saw in the caffenet128_lsuv_no_lrn_BatchNormAfterReLU.prototxt file, the lr_mult of BatchNorm is 0, and there are no other layers doing scale/shift learning, so the tests are all done without scale/shift learning? thank you

abangdd commented 8 years ago

some explanations by you in the BN pr post

Let me as author of EltwiseAffine tell something against it in special BatchNorm case :) First I have implemented it to reproduce fully results of the original paper. But now: 1) Dont see why following convolution layer could not learn scale or bias integrated itself. 2) More important dont see any difference with EltwiseAffine and without it. Now I am making quick expriment on BN-Caffenet-128 (instead of 227 image size for speed): original, this BN, this BN + EltwiseAffine, this BN, but after ReLU, not before. As soon as it will finish, I`ll put results online, but so far variant "this BN, but after ReLU" is leading by great margin.

ChannelScalarLayer and BiasLayer makes sense for me :)

ducha-aiki commented 8 years ago

hi @abangdd you are right, no scale\shift yet. My EA implementation is buggy, which is found by @siddharthm83. But I am going to fix and finish it soon.

" lr_mult of BatchNorm is 0" is not relevant here, lr_mult=0 because it is needed by internal implementation of BatchNorm layer.

abangdd commented 8 years ago

I see, thank you

ducha-aiki commented 8 years ago

@abangdd it looks like problem found https://github.com/BVLC/caffe/pull/2996#issuecomment-170445314 So I have started to train BN-EA-ReLU network

ducha-aiki commented 8 years ago

@abangdd EA+BN

So: BN after ReLU > BN after ReLU + EA > BN before ReLU + EA > BN before ReLU

abangdd commented 8 years ago

great benchmark on how to apply BN !

Pratyeka commented 8 years ago

Why BN after ReLU better than BN after conv？

Pratyeka commented 8 years ago

The result shows BN after ReLU better than BN after conv? but why? How to justify?

ducha-aiki commented 8 years ago

@Pratyeka In batchnorm paper authors state, that INPUT of layer should be normalized (mean = 0, var = 1) to make learning easy. So BN after ReLU is normalization of input to the next conv layer. But this is only speculation and I am not theory guy, sorry :)

soulslicer commented 6 years ago

Where can I find the ElementwiseAffine layer?

ducha-aiki commented 6 years ago

This is my name for two caffe layers: scale layer and bias layer. Both are standard caffe. Scale * x + bias is essentially affine transformation