Test script for evaluation of matmul error in different pre/post-processing and quantization conditions

vvchernov commented 8 months ago

Develop python script in repo which should do and test the following (see also below in the thread):

Base scenario:
- Create two matrices (X, W) filled by floating point values with predefined distribution.
- square matrices with size 1024*1024
- value type is float16
- develop flexible solution with possibility to change size and data type
- Multiply the matrices (X * W = Y) and save result (Y, original one)
- Preprocess matrices (X -> X', W -> W')
- develop flexible solution with possibility to change preprocessing type
- Multiply the preprocessed matrices (X' * W' = Y') and save result (Y', preprocessed one)
- Fake-quantize the preprocessed matrices (Q(W') or/and Q(X'))
- at least one matrix should be quantize, but it can be only one
- develop flexible solution with possibility to change quantization type
- Multiply the quantized matrices and save result (Y_q, quantized one)
- Postprocess the quantized result (Y_q -> Y_p)
- Find differences between pre/postprocessed, quantized and original results. Use metrics for the obtained matrices as final evaluation value
Data distribution
- in general it assumes combination of two normal distributions (context and outliers) for both X and W matrices
- The following parameters are used to control distribution:
- context average value
- context dispersion
- number of context values
- distance between context and outliers or outliers average value
- outliers dispersion
- number of outliers values
- Start simplification: W has context only. outliers dispersion is much less than context one (e.g. D_o = 0.1 * D_c). Outliers number is much less matrix size (e.g. N_o ~ 0.1 * sqrt(N_c)). Context average value = 0.
Preprocessing
- Smoothing from SmoothQuant
- AWQ algorithm
Quantization
- symmetric per-tensor int8
- symmetric per-channel int8
- symmetric per-group int8
- 32
- 64
- 128
- asymmetric per-tensor int8
- asymmetric per-channel int8
- asymmetric per-group int8
- 32
- 64
- 128
- GPTQ-like
- int8
- int4
- int3
Postprocessing
- First step: no postprocessing
- compensate error by bias
Metrics
- First step: Use Frobenius norm (L_F) for error evaluation (see here or Russian version)
- Study different matrix norm and analyze do they give us correct metrics.
Statistics scenario
- Use base scenario for set of matrices (e.g. 100) with the same distributions and collect error statistics (mean and std)
Calibration scenario
- Use set of matrices (e.g. 100) with the same distributions for parameters calibration and evaluate error statistics on other set of matrices (e.g. 100) with the same distributions
Additional features:
- use matrix from dump

vvchernov commented 8 months ago

Notes:

I recommended use torch framework for matrix manipulation due to it supports float16, but numpy does not
It is better to unify calculation for distribution parameters. We can go to dimensionless calculations, as unit the context dispersion can be used. But for the sake of convenience I suggest to use the context dispersion = 10. It can be checked once for other values that results are the same.

vvchernov commented 8 months ago

Tests:

[ ] Dependence of the error on distribution parameters
[ ] Dependence of the error on per-tensor SmoothQuant alpha
[ ] Dependence of the error on per-channel SmoothQuant alpha
[ ] Dependence of the error on AWQ alpha
[ ] Dependence of the error on error bias
[ ] slightly change scales to exactly match int and outliers average value

Krosha31 commented 7 months ago

https://github.com/Deelvin/tvm-samples/pull/15

added base scenario

Krosha31 commented 6 months ago

Залил текущие результаты. Завтра постараюсь еще что-нибудь сделать

Krosha31 commented 4 months ago

Залил графики на диск https://drive.google.com/drive/folders/1KMbUUymlp6QZb1f_ThX0V7K49K4AF00Y?usp=drive_link

Deelvin / mlc-llm

Test script for evaluation of matmul error in different pre/post-processing and quantization conditions #5