Hi @MatthieuCourbariaux. So you mentioned in the paper that you didn't binarize the inputs to the first layer. I tested with binarized activations input to all layers including the first layer and the accuracy seems to decrease by a very minor factor. So is there a specific reason as to why you used float activations for the first layer aside from the maximization of accuracy?
I do not exactly remember.
But I think that we meant that we did not binarize the input data = the dataset. As it did reduce the performance.
We did however binarize the activations of the first layer.
Hi @MatthieuCourbariaux. So you mentioned in the paper that you didn't binarize the inputs to the first layer. I tested with binarized activations input to all layers including the first layer and the accuracy seems to decrease by a very minor factor. So is there a specific reason as to why you used float activations for the first layer aside from the maximization of accuracy?