IntelLabs / distiller

Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller
Apache License 2.0
4.34k stars 799 forks source link

Weights not properly quantized during Quantization Aware Training #520

Open shazib-summar opened 4 years ago

shazib-summar commented 4 years ago

Hi, I'm working on applying QAT on a model. I made the necessary modifications. However, when I looked into one of the saved checkpoint .pth files, I observed that none of the weights were actually quantized. All <layer_name>.weight tensor were in floating point format. Following this, I ran the QAT example provided here. I made the same observations. None of the tensors in the saved .pth files were quantized. The command I ran is,

python3 compress_classifier.py -a resnet18 \
        -p 50 \
        -b 256        \
        <path-to-imagenet>        \
        --epochs 10        \
        --compress=../quantization/quant_aware_train/quant_aware_train_linear_quant.yaml        \
        --pretrained         \
        -j 22        \
        --lr 0.0001        \
        --vs 0        \
        --gpu 0

An excerpt from the state_dict is below

'module.conv1.weight', 

tensor([[[[-1.0691e-02, -5.3453e-03,  0.0000e+00,  ...,  5.8798e-02,
            1.6036e-02, -1.0691e-02],
          [ 1.0691e-02,  1.0691e-02, -1.1225e-01,  ..., -2.7261e-01,
           -1.2829e-01,  5.3453e-03],
          [-5.3453e-03,  5.8798e-02,  2.9399e-01,  ...,  5.1849e-01,
            2.5657e-01,  6.4143e-02],
          ...,
          [-2.6726e-02,  1.6036e-02,  7.4834e-02,  ..., -3.3141e-01,
           -4.2228e-01, -2.5657e-01],
          [ 3.2072e-02,  4.2762e-02,  6.4143e-02,  ...,  4.1159e-01,
            3.9555e-01,  1.6570e-01],
          [-1.6036e-02, -5.3453e-03, -2.6726e-02,  ..., -1.4967e-01,
           -8.0179e-02, -5.3453e-03]],
          ...,

What am I doing wrong? Because @guyjacob mentioned in this comment that

Usually in quant-aware training, the "quantized" weights will still be FP32 values - but they will be discretized. That is - for 8-bit quantization, there will be 256 different FP32 values. That's the "simulated quantization" you correctly referred to above.

I observed the discretized tensor during PTQ. The tensor were of dtype FP32, however they contained integer values. I hoped QAT would also behave similarly.

Please help me out here. Thanks for any help in advance.

lehahoang commented 4 years ago

Hi,

As per my understanding, the model's parameters after being retrained is still in FP32. They are not quantized into Int8 yet. In case you want to examine the accuracy improvement of the retrained model, you should invoke the PTQ on that model. The bash command is the same for the PTQ with a small modification of the path to your checkpoint.pth.tar (--resume-from path/to/your/checkpoint).

I hope it helps. BR

shazib-summar commented 4 years ago

But what about @guyjacob comment in which he says that the weights will be quantized after QAT?