Open mostafaelhoushi opened 4 years ago
While I haven't tried to reproduce their 4 bit results, they do mention that they increased the number of epochs when decreasing the number of bits. Have you also changed this parameter?
First of all, thanks for your quick response, I highly appreciate that!
I originally tired with epochs
setting kept at 4. I changed it to 5 and waiting for the final result.
Reading the paper, I find for 5-bits:
INQ has the property of easy convergence in training. In general, re-training with less than 8 epochs could consistently generate a lossless model with 5-bit weights in the experiments.
so when we have 4 iterative steps, {0.5, 0.75, 0.875, 1}, each step is trained for 2 epochs only?
and then for 2-bits I find:
The required number of epochs also increases when the expected bit-width goes down, and it reaches 30 when training our 2-bit ternary model
since 2-bits has 10 iterations, {0.2, 0.4, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95, 0.975, 1}, does that mean each iteration has 3 epochs?
So from reading the paper, I am not sure what are the number of epochs fro each iterative step.
It's not clear to me from the original paper how exactly they count the epochs as you have different iterations and within each iteration you do some epochs. The way I coded it is that epochs are the amount of iterations within 1 iterative step. Based on issues in the original caffe repository (https://github.com/AojunZhou/Incremental-Network-Quantization/issues/28) and running some experiments I found that with 4 epochs/iterative steps I could reproduce the 5 bits quantization results. But as this is not directly taken from the paper, I have no idea how you should actually scale the number of epochs for different bit widths.
I hope this info helps you a bit, it's been a while since I've trained a network with INQ.
Thanks Maxim for your detailed response. I have also tried training with 5 epochs per iteration and used the batch size and other hyperparameters they mentioned in their paper for ResNet18 but I still got low accuracies (about 83%).
On Mon, Mar 2, 2020 at 10:27 AM Maxim Bonnaerens notifications@github.com wrote:
It's not clear to me from the original paper how exactly they count the epochs as you have different iterations and within each iteration you do some epochs. The way I coded it is that epochs are the amount of iterations within 1 iterative step. Based on issues in the original caffe repository ( AojunZhou/Incremental-Network-Quantization#28 https://github.com/AojunZhou/Incremental-Network-Quantization/issues/28) and running some experiments I found that with 4 epochs/iterative steps I could reproduce the 5 bits quantization results. But as this is not directly taken from the paper, I have no idea how you should actually scale the number of epochs for different bit widths.
I hope this info helps you a bit, it's been a while since I've trained a network with INQ.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Mxbonn/INQ-pytorch/issues/11?email_source=notifications&email_token=AALCKHLR3D3LQKGYK6BQ5DDRFPF4LA5CNFSM4K7TTQIKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENPXHOY#issuecomment-593458107, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALCKHPI2DCUJ7DOZK4V6VLRFPF4LANCNFSM4K7TTQIA .
-- Mostafa Elhoushi
E-mail: mostafa.elhoushi@gmail.com Phone: +1 (416) 668-7361
Hi @mostafaelhoushi, were you able to solve the problem in the end or no? I am not able to reproduce the results for 2 bits weight for ResNet18. Some of the hyperparameters are also missing e.g Learning rate.
Hi @mostafaelhoushi, were you able to solve the problem in the end or no? I am not able to reproduce the results for 2 bits weight for ResNet18. Some of the hyperparameters are also missing e.g Learning rate.
I don't recall the details. But I think I wasn't able to reproduce the results
Oh, I see. This paper is quite ambiguous. Thanks for your quick response though.
I have modified the
weight_bits
to 4 and theiterative_steps
to[0.3, 0.5, 0.8, 0.9, 0.95, 1]
but I got an accuracy of 83.12%, while the original paper got an accuracy of 89.01%