nnieqat-pytorch

Nnieqat is a quantize aware training package for Neural Network Inference Engine(NNIE) on pytorch, it uses hisilicon quantization library to quantize module's weight and activation as fake fp32 format.

nnieqat-pytorch
- Table of Contents
- Installation
- Usage
- Code Examples
- Results
- Todo
- Reference

Installation

Supported Platforms: Linux
Accelerators and GPUs: NVIDIA GPUs via CUDA driver 10.1 or 10.2.
Dependencies:
- python >= 3.5, < 4
- llvmlite >= 0.31.0
- pytorch >= 1.5
- numba >= 0.42.0
- numpy >= 1.18.1
Install nnieqat via pypi:
```
$ pip install nnieqat
```
Install nnieqat in docker(easy way to solve environment problems)：
```
$ cd docker
$ docker build -t nnieqat-image .
```

Install nnieqat via repo：

$ git clone https://github.com/aovoc/nnieqat-pytorch
$ cd nnieqat-pytorch
$ make install

Usage

add quantization hook.

quantize and dequantize weight and data with HiSVP GFPQ library in forward() process.


from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook
...
...
register_quantization_hook(model)
...

merge bn weight into conv and freeze bn

suggest finetuning from a well-trained model, merge_freeze_bn at beginning. do it after a few epochs of training otherwise.

from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook
...
...
  model.train()
  model = merge_freeze_bn(model)  #it will change bn to eval() mode during training
...

Unquantize weight before update it

from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook
...
...
  model.apply(unquant_weight)  # using original weight while updating
  optimizer.step()
...

Dump weight optimized model

from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook
...
...
  model.apply(quant_dequant_weight)
  save_checkpoint(...)
  model.apply(unquant_weight)
...

Using EMA with caution(Not recommended).

Code Examples

Cifar10 quantization aware training example (add nnieqat into pytorch_cifar10_tutorial)

python test/test_cifar10.py
ImageNet quantization finetuning example (add nnieqat into pytorh_imagenet_main.py)

python test/test_imagenet.py --pretrained path_to_imagenet_dataset

Results

ImageNet

python test/test_imagenet.py /data/imgnet/ --arch squeezenet1_1  --lr 0.001 --pretrained --epoch 10   # nnie_lr_e-3_ft
python pytorh_imagenet_main.py /data/imgnet/ --arch squeezenet1_1  --lr 0.0001 --pretrained --epoch 10  # lr_e-4_ft
python test/test_imagenet.py /data/imgnet/ --arch squeezenet1_1  --lr 0.0001 --pretrained --epoch 10  # nnie_lr_e-4_ft

finetune result：

	trt_fp32	trt_int8	nnie
torchvision	0.56992	0.56424	0.56026
nnie_lr_e-3_ft	0.56600	0.56328	0.56612
lr_e-4_ft	0.57884	0.57502	0.57542
nnie_lr_e-4_ft	0.57834	0.57524	0.57730

coco

net: simplified yolov5s

train 300 epoches, hi3559 test result:

finetune 20 epoches, hi3559 test result:

Todo

Generate quantized model directly.

Reference

HiSVP 量化库使用指南

Quantizing deep convolutional networks for efficient inference: A whitepaper

8-bit Inference with TensorRT

Distilling the Knowledge in a Neural Network

aovoc / nnieqat-pytorch

readme