Jzz24 / pytorch_quantization

A pytorch implementation of dorefa quantization

MIT License

110 stars 11 forks source link

bn-fold dorefa imagenet nvidia-dali quantization resnet

readme

Dorefa-net

A pytorch implementation of dorefa.The code is inspired by LaVieEnRoseSMZ and zzzxxxttt.

Requirements

python > 3.5
torch >= 1.1.0
torchvision >= 0.4.0
tb-nightly, future (for tensorboard)
nvidia-dali >= 0.12 (faster dataloader)

Cifar-10 Accuracy

Quantized model are trained from scratch

Model	W_bit	A_bit	Acc
resnet-18	32	32	94.71%
resnet-18	4	4	94.36%
resnet-18	1	4	93.87%

ImageNet Accuracy

Quantized model are trained from scratch

Model	W_bit	A_bit	Top1	Top5
resnet-18	32	32	69.80%	89.32%
resnet-18	4	4	66.60%	87.15%

Usages

Download the ImageNet dataset and move validation images to labeled subfolders.To do this, you can use the following script

To train the model

python3 cifar_train_eval.py    
python3 imagenet_torch_loader --multiprocessing-distributed    or    python3 imagenet_dali_loader.py

To check the tensorboard log
```
tensorboard --logdir='your_log_dir'
```
then navigating to https://localhost:6006 .
To test the quantized model and bn fused
- convert to the quantized model for inference
```
python3 test_fused_quant_model.py
```
- test bn fuse on the float model
```
python3 bn_fuse.py
```
  Obviously, this fusion method is not suitable for quantized models. We will change the bn fuse in the future according to the paper section 3.2.2.
This bn fuse test result is not serious. However, it is OK to explain the problem qualitatively.

Model on CPU	before fuse	after fuse
resnet-18	0.74 s	0.51 s
resnet-34	1.41 s	0.92 s
resnet-50	1.96 s	1.02 s

To do

[x] Train on imagenet2012
[x] Fold bn
[x] Test speedup from quantization and bn fold
[ ] Deploy models to embedded devices
[ ] ...