NetEase-FuXi / EETQ

Easy and Efficient Quantization for Transformers
Apache License 2.0
174 stars 14 forks source link

Question on outlier handling #1

Closed 0xymoro closed 9 months ago

0xymoro commented 1 year ago

Hi - great work! Wanted to ask a bit more just to understand quality degradation from fp16 (if it's significant). With int8 from bitsandbytes it accounted for outlier features, does this do the same? I can't quite find much information on the process or if it does tricks or if it has noticeable performance changes from a bitsandbytes int8 version. Thanks!

SidaZh commented 12 months ago

@jerryMeng100 eetq 8 bit can be regarded as a high-performance cutlass implementation of w8a16(per-channel). From the perspective of the PPL, even without correcting for outliers, the performance of the w8a16 per-channel quantization is already very good. Although there is no direct comparison with LLM int8, there is almost no loss in accuracy compared with float16. Below is from https://browse.arxiv.org/pdf/2303.08302.pdf Table.2 image