Closed 0xymoro closed 11 months ago
@jerryMeng100 eetq 8 bit can be regarded as a high-performance cutlass implementation of w8a16(per-channel). From the perspective of the PPL, even without correcting for outliers, the performance of the w8a16 per-channel quantization is already very good. Although there is no direct comparison with LLM int8, there is almost no loss in accuracy compared with float16. Below is from https://browse.arxiv.org/pdf/2303.08302.pdf Table.2
Hi - great work! Wanted to ask a bit more just to understand quality degradation from fp16 (if it's significant). With int8 from bitsandbytes it accounted for outlier features, does this do the same? I can't quite find much information on the process or if it does tricks or if it has noticeable performance changes from a bitsandbytes int8 version. Thanks!