when you infer with the quantized model, how your weight stored?
I mean you have salient weights and unsalient weights, you quantized them seperately
how did you know which part should use salient quantized weight and which part should use unsalient quantized weight when infer with quantized model?
when you infer with the quantized model, how your weight stored? I mean you have salient weights and unsalient weights, you quantized them seperately how did you know which part should use salient quantized weight and which part should use unsalient quantized weight when infer with quantized model?
量化过程中显著性权重和非显著性权重都有对应的参数,推理过程中矩阵乘的时候怎么样判断哪些用显著性的量化参数,哪些用非显著性的量化参数?