Closed Sanghoon94 closed 7 years ago
Yes, the main point is the scaling factor. The most important one is the weight scaling factor (alpha in the paper) not the inputs. In the page 12 before the last paragraph: "We found that the scaling factors for the weights (α) is much more effective than the scaling factors for the inputs (β). Removing β reduces the accuracy by a small margin (less than 1% top-1 alexnet)."
This saves much more computation.
We do not need necessarily to have ReLU. However I noticed adding ReLU can help a bit in accuracy of XNOR-Net.
Thanks for the reply!
I get that removing β would be an efficient way of implementing. Where can I find the scaling factors of the weights (α) in the code? From alexnetxnor.lua, I can only find convolution of the weight matrix, but not the multiplication of α. Shouldn't there be a multiplication of scalar value α after the convolution (XNOR+Bitcount) of binarized weight matrix?
Thank you.
Hi,
I think you can check function binarizeConvParms(convNodes)
and couples of other functions in util.lua. These functions are called in train.lua and test.lua.
Yes, check function binarizeConvParams
Got it! Thanks a lot!
So the (binarized weight * α) is the convolution kernel in this case.
I have one more question regarding the algorithm. Sorry I bother you so much.
Identical α is multiplied to all the XNOR+Bitcount result.
Looking from inference point of view: Since the same α is multiplied to all the neurons, so it shouldn't affect the pooling(finding out maximum) layer result. (i.e. Maximum element would be the same before/after multiplying α ) Also, since α is multiplied to all the neurons and doesn't change neurons' signs (+/-) individually, the multiplied α will be lost in next layer's BinActive. through binarization. (You mentioned that β is negligible) I guess it can affect the result a little at Batch Normalization, but if we suppose we choose not to use batch normalization, what does multiplying α change? How can each layer's α values propagate to the final output of the network?
Thank you.
Yes, without batch normalization alpha does not have any effect in XNOR-Net. Theoretically you can encode alpha into the parameters of the batchNormalization then at the inference you do not need it.
Everything is crystal clear now! Thank you.
Hello! I am trying to figure out the structure of your network by this code after reading your paper.
However, I couldn't find scalar multiplications (average of weight / input data) in the model you've created in alexnetxnor.lua Also there isn't any ReLU functions in Binarized Convolution layers. Is there anything wrong in the way that I am understanding the network?
As far as I read from the paper, scalar multiplication is the core idea of this paper which made possible the distinguishable results compared to BinaryConnect or other BNNs.
Thank you.