Closed Mxbonn closed 5 years ago
@ggeor84 Thanks for your PR #2. I've created this PR inspired by yours but without changing anything to the current arguments of the functions. The custom SGD function can work with different groups of parameters so by splitting them in the main function into two groups and passing weight_bits = None
the expected result can be achieved.
I will merge this in a few days when I've been able to reproduce something close to your reported accuracy.
Reproduce paper results by not quantizing batchnorm parameters and biases. Also updated to Pytorch 1.1