QwenLM / Qwen

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
Apache License 2.0
12.47k stars 1.01k forks source link

量化细节请教 #1279

Closed zixi01chen closed 3 weeks ago

zixi01chen commented 3 weeks ago

起始日期 | Start Date

No response

实现PR | Implementation PR

No response

相关Issues | Reference Issues

No response

摘要 | Summary

请问一下,我看Qwen模型的量化模型效果损失较小,全程都是用Int8推理的吗?还是只是参数Int8,中间反量化为fp16去推理了?

基本示例 | Basic Example

缺陷 | Drawbacks

未解决问题 | Unresolved questions

No response

jklj077 commented 3 weeks ago

Hi, both AWQ and GPTQ are weight quantization.