allenai / XNOR-Net

ImageNet classification using binary Convolutional Neural Networks
https://xnor.ai/
Other
856 stars 239 forks source link

Are weights in 1st and last layers binary? #12

Closed escorciav closed 7 years ago

escorciav commented 7 years ago

Hi

According to this line, it seems that first and last layers of Binary-weights-network are not binary. Do I miss something?

thanks for providing the code to reproduce your work.

ping @mrastegari

mrastegari commented 7 years ago

Yes, The weights in the first and the last layers are real-value. We do not binarize them.

Conchylicultor commented 7 years ago

Hi, was there a technical reason for that design choice ? I suppose you tried a fully binarized network. How much was the performance drop ?

xiao1228 commented 7 years ago

Hi, If I want to binarize the first and last layer for the network should I just change for i =2, #convNodes-1 do to i=1,#convNodes in binarizeConvParms? What about updateBinaryGradWeight

xiao1228 commented 7 years ago

when I change meancenterConvParms binarizeConvParms clampConvParms to for i =1, #convNodes do the weights are very small i.e. 10^-33 and the accuracy are very low..~10%. So what is the way to binarize all the layers?

Conchylicultor commented 7 years ago

Yes, That's why the first and last layers are not binarized. I got the same results too. From my experiments the first layer is the most critical. If it is binarized, the network won't learn at all. But the last layer also have a big impact on the accuracy when binarized.

xiao1228 commented 7 years ago

Hi @Conchylicultor thank you very much for your reply..I have tried to not using binarized first layer, the accuracy improved. However, when I extracted the weights from the trained model, from the weights I dont think they are binarized. below shows an example kernel, they dont seem to be binarized. This is the second conv layer btw.

  4.2289e-03  5.0269e-03 -4.6857e-02  3.1741e-02 -3.5040e-02
  5.6437e-02 -3.0367e-02  1.8012e-02  4.0737e-02 -7.4001e-02
 -4.3729e-02  5.5849e-02  3.2189e-03 -1.8107e-03 -1.7123e-02
  5.7789e-03 -8.3973e-03 -9.4293e-03  6.4789e-02 -1.3312e-02
  9.9191e-02 -9.3009e-03 -3.0903e-02  3.6413e-02  6.4425e-02
Conchylicultor commented 7 years ago

So even if the weights are binarized, there is still a floating scale and shift (like batch normalization). There is a different scale/shift value per channels (so 256 for the layer 2). If you plot the weights for a single channel, their should be all the same (ex: 4.2289e-03 -4.2289e-03 -4.2289e-03 4.2289e-03 ...)

xiao1228 commented 7 years ago

Thanks for clarifying ..from the understanding of the code I get this point..however it's just the weights from my model are not binerized for some reasons..

Conchylicultor commented 7 years ago

From what I remember, you have to explicitly call binarizeConvParms as here https://github.com/allenai/XNOR-Net/blob/master/train.lua#L172 I think the models are saved as regular float but are binarized before the forward pass.

xiao1228 commented 7 years ago

Thanks again!