Mxbonn / INQ-pytorch

A PyTorch implementation of "Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights"
164 stars 27 forks source link

4-bits on ResNet18 results in 6% reduction in error #11

Open mostafaelhoushi opened 4 years ago

mostafaelhoushi commented 4 years ago

I have modified the weight_bits to 4 and the iterative_steps to [0.3, 0.5, 0.8, 0.9, 0.95, 1] but I got an accuracy of 83.12%, while the original paper got an accuracy of 89.01%

Mxbonn commented 4 years ago

While I haven't tried to reproduce their 4 bit results, they do mention that they increased the number of epochs when decreasing the number of bits. Have you also changed this parameter?

mostafaelhoushi commented 4 years ago

First of all, thanks for your quick response, I highly appreciate that! I originally tired with epochs setting kept at 4. I changed it to 5 and waiting for the final result.

Reading the paper, I find for 5-bits:

INQ has the property of easy convergence in training. In general, re-training with less than 8 epochs could consistently generate a lossless model with 5-bit weights in the experiments.

so when we have 4 iterative steps, {0.5, 0.75, 0.875, 1}, each step is trained for 2 epochs only?

and then for 2-bits I find:

The required number of epochs also increases when the expected bit-width goes down, and it reaches 30 when training our 2-bit ternary model

since 2-bits has 10 iterations, {0.2, 0.4, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95, 0.975, 1}, does that mean each iteration has 3 epochs?

So from reading the paper, I am not sure what are the number of epochs fro each iterative step.

Mxbonn commented 4 years ago

It's not clear to me from the original paper how exactly they count the epochs as you have different iterations and within each iteration you do some epochs. The way I coded it is that epochs are the amount of iterations within 1 iterative step. Based on issues in the original caffe repository (https://github.com/AojunZhou/Incremental-Network-Quantization/issues/28) and running some experiments I found that with 4 epochs/iterative steps I could reproduce the 5 bits quantization results. But as this is not directly taken from the paper, I have no idea how you should actually scale the number of epochs for different bit widths.

I hope this info helps you a bit, it's been a while since I've trained a network with INQ.

mostafaelhoushi commented 4 years ago

Thanks Maxim for your detailed response. I have also tried training with 5 epochs per iteration and used the batch size and other hyperparameters they mentioned in their paper for ResNet18 but I still got low accuracies (about 83%).

On Mon, Mar 2, 2020 at 10:27 AM Maxim Bonnaerens notifications@github.com wrote:

It's not clear to me from the original paper how exactly they count the epochs as you have different iterations and within each iteration you do some epochs. The way I coded it is that epochs are the amount of iterations within 1 iterative step. Based on issues in the original caffe repository ( AojunZhou/Incremental-Network-Quantization#28 https://github.com/AojunZhou/Incremental-Network-Quantization/issues/28) and running some experiments I found that with 4 epochs/iterative steps I could reproduce the 5 bits quantization results. But as this is not directly taken from the paper, I have no idea how you should actually scale the number of epochs for different bit widths.

I hope this info helps you a bit, it's been a while since I've trained a network with INQ.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Mxbonn/INQ-pytorch/issues/11?email_source=notifications&email_token=AALCKHLR3D3LQKGYK6BQ5DDRFPF4LA5CNFSM4K7TTQIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENPXHOY#issuecomment-593458107, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALCKHPI2DCUJ7DOZK4V6VLRFPF4LANCNFSM4K7TTQIA .

-- Mostafa Elhoushi

E-mail: mostafa.elhoushi@gmail.com Phone: +1 (416) 668-7361

saqibjaved1 commented 3 years ago

Hi @mostafaelhoushi, were you able to solve the problem in the end or no? I am not able to reproduce the results for 2 bits weight for ResNet18. Some of the hyperparameters are also missing e.g Learning rate.

mostafaelhoushi commented 3 years ago

Hi @mostafaelhoushi, were you able to solve the problem in the end or no? I am not able to reproduce the results for 2 bits weight for ResNet18. Some of the hyperparameters are also missing e.g Learning rate.

I don't recall the details. But I think I wasn't able to reproduce the results

saqibjaved1 commented 3 years ago

Oh, I see. This paper is quite ambiguous. Thanks for your quick response though.