NetEase-FuXi / EETQ

Easy and Efficient Quantization for Transformers
Apache License 2.0
157 stars 12 forks source link

how to dequant a EETQ model? #14

Closed mxjmtxrm closed 2 months ago

mxjmtxrm commented 2 months ago

Hi, if there is any function to dequant a int8 weight to fp16? or is there a way to dequant a EetqLinear back to a linear?

dtlzhuangz commented 2 months ago

Hi @mxjmtxrm. I think you probably want a backprob for EetqLinear. We have implemented an backprob function and it is under testing. #15 . Could you try to use it and give me a feedback?

mxjmtxrm commented 2 months ago

No, I want a model with a precision of fp16 to run on my own framework to test the accuracy of EETQ quantization. So I need to dequant the int8 weights to fp16. I noticed that there are unprocessed_quantized_weight, processed_quantized_weight and scale in (https://github.com/NetEase-FuXi/EETQ/blob/5c08b064d89853b74e4fbd87057f74385c13352a/csrc/cutlass_kernels/fpA_intB_gemm_wrapper.cu#L106) What is the difference between these two weight? if the dequant weight==EetqLinear.weight * EetqLinear.weight_scales ?

dtlzhuangz commented 2 months ago

EETQ will change the layout via preprocess_weights_for_mixed_gemm to accelerate the memory access. The two kinds of weight are different. unprocessed_quantized_weight==EetqLinear.weight * EetqLinear.weight_scales. You can refer to the backward of https://github.com/NetEase-FuXi/EETQ/pull/15/files#diff-ca179e954c684327ef4fba983db3ed3965f5406ef9f922b12e55f897829823ecR83 for how to dequantize the weight

mxjmtxrm commented 2 months ago

Got it. Thanks a lot.