关于qwen2-1.5b模型的问题

HandH1998 / QQQ

QQQ is an innovative and hardware-optimized W4A8 quantization solution for LLMs.

https://arxiv.org/pdf/2406.09904

91 stars 8 forks source link

关于qwen2-1.5b模型的问题 #23

Open darrenearl opened 1 month ago

darrenearl commented 1 month ago

作者你好，我在用qwen2-1.5b模型的时候，量化成功，但是跑出来结果不对，请问量化参数如何配置呢？

HandH1998 commented 1 month ago

可以参考https://github.com/HandH1998/QQQ/issues/17

darrenearl commented 1 month ago

可以参考#17

assert self._attn_implementation == "sdpa"做量化开启smooth以及关闭rotation后会在 assert self._attn_implementation == "sdpa"这里报错，默认会走eager模式，请问对qwen的权重要求是什么呢？qwen2-1.5b默认是tf16的

HandH1998 commented 1 month ago

我试过transformers==4.38.2，默认走的是sdpa。qwen走eager模式+fp16/bf16会有问题，参考https://github.com/huggingface/transformers/pull/33317。

darrenearl commented 1 month ago

我试过transformers==4.38.2，默认走的是sdpa。qwen走eager模式+fp16/bf16会有问题，参考https://github.com/huggingface/transformers/pull/33317。

我用的transformers 4.38.2,同样跑7b模型量化也有同样的问题，命令如下： python3 examples/quant_model.py \ --model_path Qwen2.5-7B-Instruct \ --tokenizer_path Qwen2.5-7B-Instruct \ --dtype bfloat16 \ --smooth true \ --rotation false \ --dataset wikitext2 \ --nsamples 128 \ --w_quantizer FixedQuantize \ --w_group_size 128 \ --gptq_mse true \ --gptq_groupsize 128 \ --save_path qwen7b_g16

HandH1998 commented 1 month ago

dtype float16