Wixee / SecureBiNN

7 stars 3 forks source link

Why aren't the model weights +1 or -1 ? #3

Closed Stu-Yang closed 1 year ago

Stu-Yang commented 1 year ago

Hi, @Wixee I am reading SecureBiNN and studying the code. I wonder whether the weights of model (e.g. the weights in Network-C.h5) are +1 or -1?

Before I ran the code according to SecureBiNN - How to run this project ?, I added some print(...) in main.py

...
if role == model_owner:
    model = load_bnn_model(config['model_path'])    # "model_path": "models/Network-C.h5" in config.json
    print(model.summary())                   # add newly
    print(model.layers[0].weights)           # add newly
    list_layers = extract_layers_from_model(model)
...

to get the structures and weights of model:

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 24, 24, 16)        416       
_________________________________________________________________
batch_normalization_3 (Batch (None, 24, 24, 16)        64        
_________________________________________________________________
activation_4 (Activation)    (None, 24, 24, 16)        0         
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 12, 12, 16)        0         
_________________________________________________________________
dropout_no_scale_3 (DropoutN (None, 12, 12, 16)        0         
_________________________________________________________________
bnn__conv2d_1 (BNN_Conv2D)   (None, 8, 8, 16)          6400      
_________________________________________________________________
batch_normalization_4 (Batch (None, 8, 8, 16)          64        
_________________________________________________________________
activation_5 (Activation)    (None, 8, 8, 16)          0         
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 4, 4, 16)          0         
_________________________________________________________________
dropout_no_scale_4 (DropoutN (None, 4, 4, 16)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 256)               0         
_________________________________________________________________
bnn__dense_1 (BNN_Dense)     (None, 100)               25600     
_________________________________________________________________
batch_normalization_5 (Batch (None, 100)               400       
_________________________________________________________________
activation_6 (Activation)    (None, 100)               0         
_________________________________________________________________
dropout_no_scale_5 (DropoutN (None, 100)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                1010      
_________________________________________________________________
activation_7 (Activation)    (None, 10)                0         
=================================================================
Total params: 33,954
Trainable params: 33,690
Non-trainable params: 264
_________________________________________________________________
None
[<tf.Variable 'conv2d_1/kernel:0' shape=(5, 5, 1, 16) dtype=float32, numpy=
array([[[[ 1.22906985e-02,  1.65957715e-02,  1.59741081e-02,
           4.84150909e-02,  2.63596568e-02,  2.59028394e-02,
          -1.52529612e-01,  6.94199279e-02,  4.15483564e-02,
           2.98523568e-02, -3.40725966e-02,  9.59213264e-03,
           7.67118670e-03, -4.88810753e-03,  1.65355131e-02,
          -5.41682914e-02]],

        ...,

        [[-5.52611947e-02, -4.29068692e-02,  1.13691293e-01,
          -1.05576031e-01, -3.39829534e-01,  1.32670686e-01,
           4.98346463e-02, -1.88526958e-01,  8.75623375e-02,
          -9.76365432e-03,  4.71817791e-01, -1.92901939e-01,
           2.09150493e-01,  2.44738355e-01,  5.41945361e-02,
          -1.39477476e-02]]]], dtype=float32)>, <tf.Variable 'conv2d_1/bias:0' shape=(16,) dtype=float32, numpy=
array([ 0.00064932,  0.00299514, -0.00628915, -0.01271091,  0.00082761,
       -0.00834843, -0.00330706, -0.00130034, -0.00203539,  0.01392187,
       -0.00888549,  0.01716425,  0.00910928,  0.03419283, -0.00621071,
        0.01423857], dtype=float32)>]

I would like to know why the weight in Network-C aren't +1 or -1 ?

Wix97 commented 1 year ago

@Stu-Yang Sorry for my poor English, 我就直接用中文说了。 这是由于二值网络的训练模式导致的。在训练二值网络时,我们需要保存原始浮点型的参数,在每一次推断的时候对参数二值化。你看到的h5文件中保存的正是未经二值化的初始权重。 在bnnModels.py的第114行里,model owner对这些浮点型参数进行了二值化。 希望本回答能解决你的问题。

Wix97 commented 1 year ago

如果还有疑问欢迎reopen issue

Stu-Yang commented 1 year ago

这是由于二值网络的训练模式导致的。

想问一下,这里的训练模式是指什么?我所理解的训练好之后的二值神经网络模型存储到.h5文件后,存储的参数应该为+1或者-1(或者0或者1),因为如果不是这样的话,训练好之后的二值神经网络模型大小将没有Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1所说的那样相比于全精度神经网络模型大小减小64倍的存储量。

Wixee commented 1 year ago

训练模式就是指 “在训练二值网络时,我们需要保存原始浮点型的参数,在每一次推断的时候对参数二值化。”

的确,为了方便debug,并证明提供的二值网络模型的真实性,我保存了二值网络的原始浮点参数,这导致保存的h5文件大小与实际的浮点模型大小相当,没有达到您引用的论文中那种接近缩小64倍的效果。如果想实现这种优化,只需将model owner二值化的步骤移到明文模型训练的代码里,并在保存和加载模型文件时做一些细节处理就可以了,这些应该不难做到。

Stu-Yang commented 1 year ago

明白了,十分感谢你的解答!