allenai / XNOR-Net

ImageNet classification using binary Convolutional Neural Networks
https://xnor.ai/
Other
856 stars 239 forks source link

Where is the Scalar Multiplication and ReLU Activations in alexnetxnor.lua? #4

Closed Sanghoon94 closed 7 years ago

Sanghoon94 commented 7 years ago

Hello! I am trying to figure out the structure of your network by this code after reading your paper.

However, I couldn't find scalar multiplications (average of weight / input data) in the model you've created in alexnetxnor.lua Also there isn't any ReLU functions in Binarized Convolution layers. Is there anything wrong in the way that I am understanding the network?

As far as I read from the paper, scalar multiplication is the core idea of this paper which made possible the distinguishable results compared to BinaryConnect or other BNNs.

Thank you.

mrastegari commented 7 years ago

Yes, the main point is the scaling factor. The most important one is the weight scaling factor (alpha in the paper) not the inputs. In the page 12 before the last paragraph: "We found that the scaling factors for the weights (α) is much more effective than the scaling factors for the inputs (β). Removing β reduces the accuracy by a small margin (less than 1% top-1 alexnet)."

This saves much more computation.

We do not need necessarily to have ReLU. However I noticed adding ReLU can help a bit in accuracy of XNOR-Net.

Sanghoon94 commented 7 years ago

Thanks for the reply!

I get that removing β would be an efficient way of implementing. Where can I find the scaling factors of the weights (α) in the code? From alexnetxnor.lua, I can only find convolution of the weight matrix, but not the multiplication of α. Shouldn't there be a multiplication of scalar value α after the convolution (XNOR+Bitcount) of binarized weight matrix?

Thank you.

shuangchenli commented 7 years ago

Hi,

I think you can check function binarizeConvParms(convNodes) and couples of other functions in util.lua. These functions are called in train.lua and test.lua.

mrastegari commented 7 years ago

Yes, check function binarizeConvParams

Sanghoon94 commented 7 years ago

Got it! Thanks a lot!

So the (binarized weight * α) is the convolution kernel in this case.

I have one more question regarding the algorithm. Sorry I bother you so much.

Identical α is multiplied to all the XNOR+Bitcount result.

Looking from inference point of view: Since the same α is multiplied to all the neurons, so it shouldn't affect the pooling(finding out maximum) layer result. (i.e. Maximum element would be the same before/after multiplying α ) Also, since α is multiplied to all the neurons and doesn't change neurons' signs (+/-) individually, the multiplied α will be lost in next layer's BinActive. through binarization. (You mentioned that β is negligible) I guess it can affect the result a little at Batch Normalization, but if we suppose we choose not to use batch normalization, what does multiplying α change? How can each layer's α values propagate to the final output of the network?

Thank you.

mrastegari commented 7 years ago

Yes, without batch normalization alpha does not have any effect in XNOR-Net. Theoretically you can encode alpha into the parameters of the batchNormalization then at the inference you do not need it.

Sanghoon94 commented 7 years ago

Everything is crystal clear now! Thank you.