Running the code results in almost 30% accuracy reduction

Mxbonn / INQ-pytorch

A PyTorch implementation of "Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights"

164 stars 27 forks source link

Running the code results in almost 30% accuracy reduction #1

Closed ggeor84 closed 5 years ago

ggeor84 commented 5 years ago

After the last iterative step, I get a good accuracy (say 69.6%), however, following that, there is a final quantization step where 100% of the weights are quantized. Accuracy drops to 40%. You can replicate this, by just running the code without any changes.

Mxbonn commented 5 years ago

Hey @ggeor84, I also noticed this mistake shortly after adding the extra evaluation step after the last quantization step. I discovered that I pruned the weights in the reverse order. This eliminates the final accuracy drop but using this method I only achieve around 66% accuracy instead of 69% (I pushed it to this repository now). However I have not spend too much time trying to find the right learning rate scheduling and the amount of epochs for each quantization step. I believe the code now correctly reflex the paper's algorithm but I have removed the results section from the README as I have not been able to replicate their accuracy (note if you look at the issues of the code of the authors, people also struggle the replicate the exact results without the given hyperparameters)

ggeor84 commented 5 years ago

Thank you @Mxbonn for your contributions. While, I was not able to replicate the paper's results, I was able to get closer: Baseline: Acc@1 69.758 Acc@5 89.076 Quantized: Acc@1 69.660 Acc@5 89.044 Note that the authors of INQ do not quantize bias and bn parameters. I modified your code to skip over those parameters. Other than that, key to the success is your fix of quantizing first larger weights. However, this still seems counter-intuitive to me as I would have expected the other way to work better. Thank you so much for your work

Mxbonn commented 5 years ago

@ggeor84 That comes pretty close to the original paper! Did you modify any of the hyperparameters or only remove quantizing the bias and bn parameters? Feel free to submit a PR if you want ;) I think the reason that quantizing the larger weights works better is because due to the non-uniform quantization (powers of 2) the distance of the largest values to their quantized value is the largest and therefore should be quantized first.

ggeor84 commented 5 years ago

I didn't. I used the parameters as you have set them. I'll submit a PR in a couple of days. I made the modifications in my repository so I need to test them first in yours before i do the PR.

ggeor84 commented 5 years ago

just submitted a PR. Once again, thank you for your contributions!