Closed huliangbing closed 4 months ago
Thank you for this issue!
I think it can support int4/8 quantized models, but actually I have not experimented with qwen1.5-int4. The code works for all Qwen models with fp16/bf16 (chunkqwen_attn_replace.py
).
Thank you for your so quick reply. If I use qwen1.5-int4/8 models, what should I do?
I am not an expert in quantized models. Maybe you can directly use the code and provide some detailed information like terminal outputs.
I will have a try.
Great work! Does it support the Qwen1.5-72B-Chat-GPTQ-Int4 quantized model?