Aaronhuang-778 / BiLLM

(ICML 2024) BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
https://arxiv.org/abs/2402.04291
MIT License
155 stars 12 forks source link

question about weight storage when infer #11

Open DamonsJ opened 3 months ago

DamonsJ commented 3 months ago

when you infer with the quantized model, how your weight stored? I mean you have salient weights and unsalient weights, you quantized them seperately how did you know which part should use salient quantized weight and which part should use unsalient quantized weight when infer with quantized model?

量化过程中显著性权重和非显著性权重都有对应的参数,推理过程中矩阵乘的时候怎么样判断哪些用显著性的量化参数,哪些用非显著性的量化参数?