Deelvin / mlc-llm

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
https://mlc.ai/mlc-llm
Apache License 2.0
0 stars 0 forks source link

Test script for evaluation of matmul error in different pre/post-processing and quantization conditions #5

Open vvchernov opened 8 months ago

vvchernov commented 8 months ago

Develop python script in repo which should do and test the following (see also below in the thread):

  1. Base scenario:
    • Create two matrices (X, W) filled by floating point values with predefined distribution.
    • square matrices with size 1024*1024
    • value type is float16
    • develop flexible solution with possibility to change size and data type
    • Multiply the matrices (X * W = Y) and save result (Y, original one)
    • Preprocess matrices (X -> X', W -> W')
    • develop flexible solution with possibility to change preprocessing type
    • Multiply the preprocessed matrices (X' * W' = Y') and save result (Y', preprocessed one)
    • Fake-quantize the preprocessed matrices (Q(W') or/and Q(X'))
    • at least one matrix should be quantize, but it can be only one
    • develop flexible solution with possibility to change quantization type
    • Multiply the quantized matrices and save result (Yq, quantized one)
    • Postprocess the quantized result (Yq -> Yp)
    • Find differences between pre/postprocessed, quantized and original results. Use metrics for the obtained matrices as final evaluation value
  2. Data distribution
    • in general it assumes combination of two normal distributions (context and outliers) for both X and W matrices
    • The following parameters are used to control distribution:
    • context average value
    • context dispersion
    • number of context values
    • distance between context and outliers or outliers average value
    • outliers dispersion
    • number of outliers values
    • Start simplification: W has context only. outliers dispersion is much less than context one (e.g. Do = 0.1 * Dc). Outliers number is much less matrix size (e.g. No ~ 0.1 * sqrt(Nc)). Context average value = 0.
  3. Preprocessing
    • Smoothing from SmoothQuant
    • AWQ algorithm
  4. Quantization
    • symmetric per-tensor int8
    • symmetric per-channel int8
    • symmetric per-group int8
    • 32
    • 64
    • 128
    • asymmetric per-tensor int8
    • asymmetric per-channel int8
    • asymmetric per-group int8
    • 32
    • 64
    • 128
    • GPTQ-like
    • int8
    • int4
    • int3
  5. Postprocessing
    • First step: no postprocessing
    • compensate error by bias
  6. Metrics
    • First step: Use Frobenius norm (LF) for error evaluation (see here or Russian version)
    • Study different matrix norm and analyze do they give us correct metrics.
  7. Statistics scenario
    • Use base scenario for set of matrices (e.g. 100) with the same distributions and collect error statistics (mean and std)
  8. Calibration scenario
    • Use set of matrices (e.g. 100) with the same distributions for parameters calibration and evaluate error statistics on other set of matrices (e.g. 100) with the same distributions
  9. Additional features:
    • use matrix from dump
vvchernov commented 8 months ago

Notes:

vvchernov commented 8 months ago

Tests:

Krosha31 commented 7 months ago

https://github.com/Deelvin/tvm-samples/pull/15

added base scenario

Krosha31 commented 6 months ago

Залил текущие результаты. Завтра постараюсь еще что-нибудь сделать

Krosha31 commented 4 months ago

Залил графики на диск https://drive.google.com/drive/folders/1KMbUUymlp6QZb1f_ThX0V7K49K4AF00Y?usp=drive_link