jihunoh-neubla commented 2 years ago

Paper: S. Wiedemann et al., "DeepCABAC: A Universal Compression Algorithm for Deep Neural Networks," IEEE Journal of selected topics in signal processing, May 2020.
Original github: https://github.com/fraunhoferhhi/DeepCABAC
Summary: A universal compression algorithm for DNNs that is based on applying Context-based Adaptive Binary Arithmetic Coder (CABAC) to the DNN parameters. DeepCABAC applies a novel quantization scheme that minimizes a rate-distortion function while simultaneously taking the impact of quantization to the DNN performance into account. For instance, it is able to compress the VGG16 ImageNet model by x63.6 with no loss of accuracy.
Purpose: To see the feasibility of this compression technique if applying to our NPU

jihunoh-neubla commented 2 years ago

Compression Algorithm

Background

Lossless coding
1. Huffman coding
2. Arithmetic coding
3. Universal coding
  - Characteristics
  - Universality: adapt to a wide range of different types of input distribution
  - Minmial redundancy: binary representations of minimal redundancy
  - High efficiency: high throughput
  - CABAC (context-based adaptive binary arithmetic coding, adopted to H.265/HEVC)
    1. Binarization - predefine a series of binary decision
    2. Context-modeling - assigns a binary probability model to each bin, updated on-the-fly by the local statistics of data
    3. Arithmetic conding - employs an arithmetic coder
Lossy coding
- the quantizer Q is non-invertible
  1. Scalar Lloyd Algorithm
  2. CABAC-based RD-quantization
- Given a set of quantization points and select CABAC as universal lossless code
DeepCABAC
- Overall process
  1. Extract the weight params of NN layer-by-layer in row-major order
  2. Select a value \beta which defines the set of quantization points
  3. subsequently quantizes the weight values by minimizing a weight rate-distortion function
  4. compress them by applying CABAC
  5. reconstruct the network and evaluates the prediction performance of the quantized model
  6. this process is repeated for a set of hyperparam \beta until the desired accuracy vs. size trade-off is achieved
- Coder
- Coder + Quantizer
- Assignment: finding quantizer Q optimaaly assigns the quantization point to each weight param
- Quant. points: finding the optimal quantization point value
  - quantizer has two configurable hyper params \beta = (\step_size, \lambda)

jihunoh-neubla commented 2 years ago

Compression Performance

Model: Yolo v3 (input size = 608x608) /inshared/Model/vision/object_detection/yolo_v3/onnx/yolov3_d53_mstrain-608_273e_coco_20210518_115020-a2c3acb8_608.onnx
Method
1. Our 8-bit uniform layer-wise asymmetric quantization + binary coding
2. Deepcabac's quantization (8-bit uniform layer-wise symmetric quantization) + binary coding
Measure

	Encoded parameter size (MB)	Mean diff (fp32 - dequant. fp32)	Max diff (fp32 - dequant. fp32)	mAP
original	238	-	-	0.309
our quantization	40 (x5.9)	0.000806	0.041181	0.275
deepcabac quantization	28 (x8.5)	0.001353	11.312755	-

deumji-woo commented 2 years ago

Compression ratio result

Compression method
- Didn't use DeepCABAC quantization.
- Only use their "coder" part.
- our quantization + "coder" of DeepCABAC
channel-wise symmetric quantization
- model: /neublan-shared/Model/vision/object_detection/yolo_v3/onnx/yolov3_d53_mstrain-608_273e_coco_20210518_115020-a2c3acb8_608.q8_cwq_exc_concat_mul_add193+.onnx
- compression ratio : 87%
  
  detail result
layer-wise asymmetric quantization
- model: /neublan-shared/Model/vision/object_detection/yolo_v3/onnx/yolov3_d53_mstrain-608_273e_coco_20210518_115020-a2c3acb8_608.q8_lwq_exc_concat_mul_add193+.onnx
- compression ratio : 68%
  
  detail result

jihunoh-neubla commented 2 years ago

DeepCABAC

deecabac is compression-aware uniform quantization
- when quantizing weights, the length of the binary encoding of quantized weights is used as a regularization term
- so, this produces better compression performance in the following encoding than normal quantization and encoding.
only layer-wise and symmetric is supported now
for encoding, we use CABAC which is a lossless binary encoding method

model = yolov3_d53_mstrain-608_273e_coco_optim w/ 80 classes (original ver.)

quantization & encoding	compression ratio (%) (bin/int8)	zero ratio (%)	mAP
fp32	-	-	0.309
uniform quant & CABAC	65.5	13	0.304
deepcabac quant & CABAC	56.7	19.5	0.304

detail (uniform quant & CABAC) detail (deepcabac quant & CABAC)

model = yolov3_v23_mstrain-608_273e_coco_optim w/ 6 classes (reduced ver.)

quantization & encoding	compression ratio (%) (bin/int8)	zero ratio (%)	mAP
fp32	-	-	0.435
uniform quant & CABAC	62	10.5	0.431
deepcabac quant & CABAC	54.5	18.1	0.432

detail (uniform quant & CABAC) detail (deepcabac quant & CABAC)

Experiment of `interv` param effect in DeepCABAC

addsteps = 256
lambda = 0.

interv	0.7	0.6	0.5	0.4	0.3
compression ratio (%)	53.6	54.5	55.6	56.7	58
zero ratio (%)	19.2	18.1	16.9	15.7	14.5
mAP	0.428	0.432	0.426	0.425	0.422

Experiment of `lambda` param effect in DeepCABAC

addsteps = 256
interv = 0.6

lambda	0.0	0.001	0.01	0.1
compression ratio (%)	54.5	54.5	54.4	54.5
zero ratio (%)	18.1	18.1	18.3	18.3
mAP	0.432	0.432	0.432	0.407

@deumji-woo @neubla-minwook @wonsubkim-neubla @jong-won-lee

jihunoh-neubla commented 2 years ago

Asymmetric DeepCABAC quantization

model = yolov3_v23_mstrain-608_273e_coco_optim w/ 6 classes (reduced ver.)

addsteps = 256
interv = 0.6
lambda = 0.0

quantization & encoding	compression ratio (%) (bin/int8)	zero ratio (%)	mAP
fp32	-	-	0.435
(Symmetric) deepcabac quant & CABAC	54.5	18.1	0.432
Asymmetric deepcabac quant & CABAC	54.5	18.1	0.437

NeublaCorp / DeepCABAC

DeepCABAC #1

Compression Algorithm

Compression Performance

Compression ratio result

DeepCABAC

Experiment of `interv` param effect in DeepCABAC

Experiment of `lambda` param effect in DeepCABAC

Asymmetric DeepCABAC quantization

NeublaCorp / DeepCABAC

DeepCABAC #1

Compression Algorithm

Compression Performance

Compression ratio result

DeepCABAC

Experiment of interv param effect in DeepCABAC

Experiment of lambda param effect in DeepCABAC

Asymmetric DeepCABAC quantization

Experiment of `interv` param effect in DeepCABAC

Experiment of `lambda` param effect in DeepCABAC