Mxbonn / INQ-pytorch

A PyTorch implementation of "Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights"
164 stars 27 forks source link

About the pretrained model #5

Open tongtyr opened 5 years ago

tongtyr commented 5 years ago

Hello,I am confused of the training epoch, in your code,you set the epochs as 4, if I want to quantize resnet18, do I need to change it? and do you have the quantized model of resnet18 of other bit width except 5? Thank you!

Mxbonn commented 5 years ago

Training epochs can be a bit confusing, I agree. In Incremental Network Quantization you have 2 kind of iterations. The first one is the amount of times you do a new weight partitioning where you determine which weights get fixed and which will still be trained. And then you have the amount of training loops you do within each of these quantization iterations. With epochs in the code I mean the latter, the first one is defined by 'iterative_steps': [0.5, 0.75, 0.875, 1],.

The only thing you have to change in the code should be the path to imagenet, the remainder should be the setup to quantize resnet18. I did not train on other bit widths than 5 bits with this setup.

tongyutyr commented 5 years ago

I changed nothing about the code except for the data path,but I found that when I began to quantize the model,the loss was big. Is that correct? the environment of mine is python3.6 and pytorch 1.0.1.

=> using pre-trained model 'resnet18' Test: [0/196] Time 7.270 (7.270) Loss 0.6744 (0.6744) Acc@1 80.078 (80.078) Acc@5 96.094 (96.094) Test: [10/196] Time 0.076 (0.728) Loss 1.1976 (0.8823) Acc@1 67.969 (77.592) Acc@5 90.234 (92.898) Test: [20/196] Time 0.088 (0.509) Loss 0.8896 (0.9078) Acc@1 80.469 (76.860) Acc@5 91.016 (92.615) Test: [30/196] Time 0.076 (0.443) Loss 0.9277 (0.8707) Acc@1 77.344 (77.923) Acc@5 92.969 (92.868) Test: [40/196] Time 0.072 (0.414) Loss 0.8867 (0.9135) Acc@1 75.391 (76.229) Acc@5 95.703 (93.035) Test: [50/196] Time 0.072 (0.391) Loss 0.6245 (0.9081) Acc@1 83.984 (75.973) Acc@5 95.703 (93.275) Test: [60/196] Time 0.071 (0.382) Loss 1.1484 (0.9200) Acc@1 71.875 (75.666) Acc@5 94.141 (93.411) Test: [70/196] Time 0.072 (0.394) Loss 0.8141 (0.9027) Acc@1 79.297 (76.276) Acc@5 94.141 (93.563) Test: [80/196] Time 0.073 (0.386) Loss 1.5958 (0.9238) Acc@1 64.453 (76.013) Acc@5 85.547 (93.277) Test: [90/196] Time 0.072 (0.367) Loss 2.2526 (0.9903) Acc@1 49.219 (74.704) Acc@5 79.688 (92.449) Test: [100/196] Time 1.257 (0.368) Loss 1.6807 (1.0526) Acc@1 57.031 (73.434) Acc@5 84.766 (91.646) Test: [110/196] Time 0.078 (0.366) Loss 1.1636 (1.0807) Acc@1 71.875 (72.934) Acc@5 89.844 (91.248) Test: [120/196] Time 0.076 (0.363) Loss 1.8209 (1.1061) Acc@1 58.594 (72.569) Acc@5 79.297 (90.825) Test: [130/196] Time 0.540 (0.356) Loss 0.9303 (1.1458) Acc@1 76.562 (71.678) Acc@5 94.141 (90.392) Test: [140/196] Time 0.735 (0.355) Loss 1.3774 (1.1680) Acc@1 65.234 (71.271) Acc@5 85.938 (90.129) Test: [150/196] Time 0.072 (0.351) Loss 1.3500 (1.1946) Acc@1 73.438 (70.791) Acc@5 85.938 (89.727) Test: [160/196] Time 0.072 (0.350) Loss 1.0733 (1.2141) Acc@1 76.953 (70.477) Acc@5 90.625 (89.490) Test: [170/196] Time 0.083 (0.347) Loss 0.8934 (1.2377) Acc@1 77.344 (69.970) Acc@5 91.406 (89.163) Test: [180/196] Time 0.072 (0.345) Loss 1.4582 (1.2560) Acc@1 62.109 (69.615) Acc@5 89.453 (88.946) Test: [190/196] Time 0.073 (0.344) Loss 1.3996 (1.2547) Acc@1 63.281 (69.582) Acc@5 91.797 (88.993)

Mxbonn commented 5 years ago

Hey, I just reran the example file in a docker container, only modifying the datapath.

My output looks like this:

2019-06-27T08:34:25.806910467Z Test: [0/196]    Time 19.438 (19.438)    Loss 0.6744 (0.6744)    Acc@1 80.078 (80.078)   Acc@5 96.094 (96.094)
2019-06-27T08:34:25.806964309Z Test: [10/196]   Time 0.037 (1.797)  Loss 1.1976 (0.8823)    Acc@1 67.969 (77.592)   Acc@5 90.234 (92.898)
2019-06-27T08:34:25.806970136Z Test: [20/196]   Time 0.031 (0.964)  Loss 0.8896 (0.9078)    Acc@1 80.469 (76.860)   Acc@5 91.016 (92.615)
2019-06-27T08:34:25.806976041Z Test: [30/196]   Time 0.045 (0.666)  Loss 0.9277 (0.8707)    Acc@1 77.344 (77.923)   Acc@5 92.969 (92.868)
2019-06-27T08:34:25.806980598Z Test: [40/196]   Time 0.103 (0.551)  Loss 0.8867 (0.9135)    Acc@1 75.391 (76.229)   Acc@5 95.703 (93.035)
2019-06-27T08:34:25.806985306Z Test: [50/196]   Time 0.041 (0.452)  Loss 0.6245 (0.9081)    Acc@1 83.984 (75.973)   Acc@5 95.703 (93.275)
2019-06-27T08:34:25.806989953Z Test: [60/196]   Time 0.120 (0.424)  Loss 1.1484 (0.9200)    Acc@1 71.875 (75.666)   Acc@5 94.141 (93.411)
2019-06-27T08:34:25.806994431Z Test: [70/196]   Time 0.049 (0.378)  Loss 0.8141 (0.9027)    Acc@1 79.297 (76.276)   Acc@5 94.141 (93.563)
2019-06-27T08:34:25.806998700Z Test: [80/196]   Time 0.088 (0.361)  Loss 1.5958 (0.9238)    Acc@1 64.453 (76.013)   Acc@5 85.547 (93.277)
2019-06-27T08:34:25.807003230Z Test: [90/196]   Time 0.057 (0.331)  Loss 2.2526 (0.9903)    Acc@1 49.219 (74.704)   Acc@5 79.688 (92.449)
2019-06-27T08:34:25.807008141Z Test: [100/196]  Time 0.106 (0.320)  Loss 1.6807 (1.0526)    Acc@1 57.031 (73.434)   Acc@5 84.766 (91.646)
2019-06-27T08:34:25.807013085Z Test: [110/196]  Time 1.150 (0.313)  Loss 1.1636 (1.0807)    Acc@1 71.875 (72.934)   Acc@5 89.844 (91.248)
2019-06-27T08:34:25.807018346Z Test: [120/196]  Time 0.088 (0.298)  Loss 1.8209 (1.1061)    Acc@1 58.594 (72.569)   Acc@5 79.297 (90.825)
2019-06-27T08:34:25.807023922Z Test: [130/196]  Time 0.072 (0.295)  Loss 0.9303 (1.1458)    Acc@1 76.562 (71.678)   Acc@5 94.141 (90.392)
2019-06-27T08:34:25.807028979Z Test: [140/196]  Time 0.044 (0.279)  Loss 1.3774 (1.1680)    Acc@1 65.234 (71.271)   Acc@5 85.938 (90.129)
2019-06-27T08:34:25.807034751Z Test: [150/196]  Time 0.072 (0.274)  Loss 1.3500 (1.1946)    Acc@1 73.438 (70.791)   Acc@5 85.938 (89.727)
2019-06-27T08:34:25.807056351Z Test: [160/196]  Time 0.090 (0.262)  Loss 1.0733 (1.2141)    Acc@1 76.953 (70.477)   Acc@5 90.625 (89.490)
2019-06-27T08:34:25.807061511Z Test: [170/196]  Time 0.104 (0.262)  Loss 0.8934 (1.2377)    Acc@1 77.344 (69.970)   Acc@5 91.406 (89.163)
2019-06-27T08:34:25.807066914Z Test: [180/196]  Time 0.033 (0.250)  Loss 1.4582 (1.2560)    Acc@1 62.109 (69.615)   Acc@5 89.453 (88.946)
2019-06-27T08:34:25.807071710Z Test: [190/196]  Time 0.032 (0.246)  Loss 1.3996 (1.2547)    Acc@1 63.281 (69.582)   Acc@5 91.797 (88.993)
2019-06-27T08:34:25.807075897Z  * Acc@1 69.758 Acc@5 89.078
2019-06-27T08:36:10.019742694Z Epoch: [0][0/5005]   Time 6.768 (6.768)  Data 3.291 (3.291)  Loss 1.5907 (1.5907)    Acc@1 61.719 (61.719)   Acc@5 85.156 (85.156)
2019-06-27T08:36:10.019795449Z Epoch: [0][10/5005]  Time 0.064 (0.679)  Data 0.000 (0.300)  Loss 1.4104 (1.6633)    Acc@1 62.891 (60.440)   Acc@5 85.547 (83.310)
2019-06-27T08:36:10.019802350Z Epoch: [0][20/5005]  Time 0.079 (0.391)  Data 0.000 (0.158)  Loss 1.4512 (1.6237)    Acc@1 66.016 (61.979)   Acc@5 85.547 (83.743)
2019-06-27T08:36:10.019806941Z Epoch: [0][30/5005]  Time 0.105 (0.306)  Data 0.000 (0.107)  Loss 1.3594 (1.5989)    Acc@1 69.141 (62.462)   Acc@5 87.500 (83.984)
2019-06-27T08:36:10.019811663Z Epoch: [0][40/5005]  Time 0.125 (0.265)  Data 0.005 (0.082)  Loss 1.4409 (1.5905)    Acc@1 66.406 (62.367)   Acc@5 86.328 (84.261)
2019-06-27T08:36:10.019816956Z Epoch: [0][50/5005]  Time 0.080 (0.240)  Data 0.000 (0.067)  Loss 1.4476 (1.5773)    Acc@1 64.844 (62.661)   Acc@5 85.547 (84.383)
2019-06-27T08:36:10.019821265Z Epoch: [0][60/5005]  Time 0.252 (0.222)  Data 0.084 (0.058)  Loss 1.4737 (1.5590)    Acc@1 65.234 (63.025)   Acc@5 86.328 (84.548)
2019-06-27T08:36:10.019825678Z Epoch: [0][70/5005]  Time 0.089 (0.210)  Data 0.000 (0.050)  Loss 1.6264 (1.5505)    Acc@1 63.672 (63.248)   Acc@5 84.375 (84.617)
2019-06-27T08:36:10.019829956Z Epoch: [0][80/5005]  Time 0.103 (0.203)  Data 0.000 (0.048)  Loss 1.3459 (1.5431)    Acc@1 66.797 (63.600)   Acc@5 89.453 (84.727)
2019-06-27T08:36:10.019834473Z Epoch: [0][90/5005]  Time 0.106 (0.193)  Data 0.000 (0.043)  Loss 1.6379 (1.5359)    Acc@1 62.109 (63.805)   Acc@5 81.641 (84.779)

So the loss should be way lower and the accuracy higher. I also noticed you have less epochs. Are you sure you have the correct dataset? (The fact that your validation error is correct makes me think that you do have the right dataset however). On how many GPUs are you running the code (not that it should matter).

-- In case someone else stumbles upon this issue and ran the code, could you let me know if everything works fine?

xysun commented 5 years ago

I find this line a bit weird: the quantization_scheduler.step() should be called outside of the retraining epoch loops I believe? i.e. step while we're advancing iterative_steps only

Mxbonn commented 5 years ago

I find this line a bit weird: the quantization_scheduler.step() should be called outside of the retraining epoch loops I believe? i.e. step while we're advancing iterative_steps only

Following up on this in #8