Does it support the Qwen1.5-72B-Chat-GPTQ-Int4 quantized model?

HKUNLP / ChunkLlama

[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"

Apache License 2.0

341 stars 18 forks source link

Does it support the Qwen1.5-72B-Chat-GPTQ-Int4 quantized model? #14

Closed huliangbing closed 4 months ago

huliangbing commented 5 months ago

Great work! Does it support the Qwen1.5-72B-Chat-GPTQ-Int4 quantized model?

ChenxinAn-fdu commented 5 months ago

Thank you for this issue!

I think it can support int4/8 quantized models, but actually I have not experimented with qwen1.5-int4. The code works for all Qwen models with fp16/bf16 (chunkqwen_attn_replace.py).

huliangbing commented 5 months ago

Thank you for your so quick reply. If I use qwen1.5-int4/8 models, what should I do?

ChenxinAn-fdu commented 5 months ago

I am not an expert in quantized models. Maybe you can directly use the code and provide some detailed information like terminal outputs.

huliangbing commented 5 months ago

I will have a try.