Closed Rahul-Sridhar closed 4 years ago
@blueardour Would you help out this issue?
@Rahul-Sridhar Hi, might I know what is the purpose of the quantization?
Seemed you were using the default quantization utilization of pytorch for compressing the model. As far as I know, it realizes 8-bit fix point quantization and the framwork only takes charging in obtaining the 8-bit model. If you want to run faster on dedicated platforms, for example, the Raspi ARM board, it might requires you further implement the 8-bit inference.
If you only want the 8-bit model, could you show what kind error you met?
@blueardour I wanted to convert ABCNet to int8 format with PyTorch quantization framework for deploying on Raspberry Pi. While doing so I met the below problem:
Hi, @Rahul-Sridhar
For the question.
For the static quantization you mentioned, I guess it is a kind of post-quantization (or zero-shot quantization in some works), which requires no finetuning for quantization from a full precision network. Correct me if I am wrong.
For quantization with fintuning, such as QAT, I wonder if you are interesting on more compact network. Currently, I pay much attention on extreme low bit quantization such as 2-bit/ ternary/binary network. I'v released a project with several state-of-the-art qunatization algorithms. On dectection task, even 2-bit models show reasonable accuracy. Evaluation on ABC-net is on progress, however, need some more time to get it ready. Refer: https://github.com/blueardour/model-quantization
Hi @blueardour,
Hello @Rahul-Sridhar,
From my perspective, post-quantization (or zero-short quantization) suffers higher risk of sizable performance gap. It is not recommended only if the training dataset is not available. When conducting zero-shot quantization, using cutting-edge methods such as the DFQ rather than the pytorch native 8-bit quantization is more promising to get reasonable performance.
If you have the training dataset, and hope the performance to be as high as possible, fine-tuning with algorithms such as Dorefa-Net/LSQ/LQ-nets is suggested. For a vast kinds of tasks, 8-bit is enough to obtain the same or even better performance compared with the full precision model (quantization has some regularization effect). The ahead mentioned project provides support of different quantization algorithms. Many of them (LSQ recommended) support any bit quantization (of course, can be used in 8-bit quantization).
Hi @blueardour,
I will try the above techniques. Thanks for suggesting the above resources. It is very helpful.
Thank you for your work!
I have tried to quantize ABCNet but have been unsuccessful due to the below reasons:
Some errors by Quant/DeQuant
Could you please share some guidelines for quantizing ABCNet on PyTorch? Thanks in advance.