Open Andy0422 opened 1 week ago
Ref to https://github.com/HandH1998/QQQ/issues/13#issuecomment-2319955934. In my practice, rotation+gptq is generally better than smooth+gptq for per-channel quantization. However, this is not the case for some models, such as https://github.com/HandH1998/QQQ/issues/17.
@HandH1998
Hi,thank you for your kindly help. I encountered another problem with the calibration data,
from my test result as following, the results with wikitext2 seems ok, and the results with pile calib dataset is not aligned with your original data. The pile data I used in from https://huggingface.co/datasets/mit-han-lab/pile-val-backup/tree/main, could share your pile dataset for me? or share your comments on this finding. email: wangdawei_0422@163.com.
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
Granularity | Method | Llama-2 | Wikitext2 | Pile | paper data -- | -- | -- | -- | -- | -- per-channel | smooth+gptq | 7B | 5.98 | 6.14 | 5.95 per-group | smooth+gptq | | 5.71 | 5.78 | 5.71
Hi,
Can you share the rotation+gptq ppl data? is it better than smoothquant+gptq? Many tks!