Closed k-zhou08 closed 5 years ago
The training enforces the weights to be in the set of {-Wmax, Wmax} per layer. When dumping the weights we can thus represent them using 1-bit and a high-precision value of Wmax, which is merged into the activation together with batch-norm. Check evaluate.py to see how this is done.
I ran it on the HLS, and found the result is wrong, as shown in the attachment. the python can give the right result.
Did you manage to figure out what was wrong here? Maybe the input is not fed correctly?
if bitW == 1: with G.gradient_override_map({"Sign": "Identity"}): E = tf.stop_gradient(tf.reduce_mean(tf.abs(x))) return tf.sign(x) * E
I wonder how can you represent the weights with only 1 bit. or you only use this during the training and when you dump the weights, you used other method to quantize the weight.