cc-hpc-itwm / TensorQuant

Apache License 2.0
47 stars 11 forks source link

How to implement grdients like paper :Binarized Neural Networks ? #3

Closed cw-plus closed 6 years ago

cw-plus commented 6 years ago

How to implement grdients like paper :Binarized Neural Networks? image I notice you assume TensorQuant gradient is always 1, and if I want to implement grdients like paper :Binarized Neural Networks ,what should I do? Thanks a lot.

DominikFHG commented 6 years ago

In TensorQuant/Quantize/QuantKernelWrapper.py and TensorQuant/Quantize/FixedPoint.py you will find the wrappers which register the gradients of the different quantization methods (@ops.RegisterGradient). In Tensorflow you can apply relational operators to tensors, but the result will be a boolean type tensor, so you will need to cast it back to its original type. Also, you might find this link useful, explaining how to register gradients in Tensorflow.

cw-plus commented 6 years ago

Thanks.I implement it.

@ops.RegisterGradient("QuantSign")
def _quant_sign_grad(op, grad):
   #To implement the sign (paper from bengio) gradient
    xx = tf.identity(op.inputs[0])
    # print("x : ",xx.eval())  test pass
    xx = tf.clip_by_value(xx, -1., 1.) # To distribute inputs[0] in range [-1, 1]
    yy = tf.ones_like(xx)              # yy is a tensor whoes shape is same as xx
    delt = tf.abs(tf.abs(xx)-yy)       # |xx|-yy get :inputs[0] elements who is in (-1,1)
    delt = tf.ceil(delt)               # to get the result that |op|>1 => grad = grad*0 elsewise grad =grad
    # print("------:",delt.eval())  pass
    grad = tf.multiply(delt, grad)     # final output

    return [grad]

The result look like right.

('inputs: ', array([[[[ 0.9794708 , 7.8065434 ], [ -3.2875454 , -1.097891 ], [-12.721996 , 16.402979 ], [ -0.5255345 , -17.342981 ]],

    [[ -8.2592535 ,   4.5270705 ],
     [ -2.2045264 ,  -6.380336  ],
     [ 19.783878  ,  -3.1297235 ],
     [  2.1022735 ,  -3.7433875 ]],

    [[  5.914869  ,  -6.8573623 ],
     [  7.194282  ,  12.00661   ],
     [ -8.9192095 ,  -8.119039  ],
     [ -5.6874366 ,  16.598093  ]],

    [[ 11.135046  ,   4.029105  ],
     [  1.9707218 , -14.666031  ],
     [ -1.1781573 , -15.765383  ],
     [-10.605588  ,  -9.970756  ]]],

   [[[ -6.453623  ,   5.412363  ],
     [-11.768549  ,   6.1537633 ],
     [ -2.8233638 , -11.992054  ],
     [ 15.38794   ,  -7.156569  ]],

    [[-11.156998  , -17.290377  ],
     [ -8.680723  , -13.315008  ],
     [  7.547665  ,  -5.0133014 ],
     [  4.5976114 ,  19.168005  ]],

    [[ 16.036833  ,  -7.766696  ],
     [ -9.976102  ,  -6.782584  ],
     [ -8.240993  ,   2.4700897 ],
     [  2.6926963 ,   0.5079334 ]],

    [[-16.131004  ,  16.755043  ],
     [  2.1212428 ,   6.922239  ],
     [ -3.4023576 ,   0.32383335],
     [ -1.2782183 ,   3.959227  ]]]], dtype=float32))

('gradient y1: ', [array([[[[1., 0.], [0., 0.], [0., 0.], [1., 0.]],

    [[0., 0.],
     [0., 0.],
     [0., 0.],
     [0., 0.]],

    [[0., 0.],
     [0., 0.],
     [0., 0.],
     [0., 0.]],

    [[0., 0.],
     [0., 0.],
     [0., 0.],
     [0., 0.]]],

   [[[0., 0.],
     [0., 0.],
     [0., 0.],
     [0., 0.]],

    [[0., 0.],
     [0., 0.],
     [0., 0.],
     [0., 0.]],

    [[0., 0.],
     [0., 0.],
     [0., 0.],
     [0., 1.]],

    [[0., 0.],
     [0., 0.],
     [0., 1.],
     [0., 0.]]]], dtype=float32)])

Please correct me if I am wrong. Thanks