QwenLM / Qwen2

Qwen2 is the large language model series developed by Qwen team, Alibaba Cloud.
5.98k stars 336 forks source link

我想通过gptq量化qwen-moe-a2.7b。但是好像不支持,请问官方怎么量化的。 #328

Open wellcasa opened 2 months ago

wellcasa commented 2 months ago

Traceback (most recent call last): File "/home/admin/workspace/aop_lab/app_source/run_gptq.py", line 89, in model = AutoGPTQForCausalLM.from_pretrained(args.model_name_or_path, quantize_config, device_map="auto", File "/home/admin/miniconda3/lib/python3.10/site-packages/auto_gptq/modeling/auto.py", line 75, in from_pretrained model_type = check_and_get_model_type(pretrained_model_name_or_path, trust_remote_code) File "/home/admin/miniconda3/lib/python3.10/site-packages/auto_gptq/modeling/_utils.py", line 305, in check_and_get_model_type raise TypeError(f"{config.model_type} isn't supported yet.") TypeError: qwen2_moe isn't supported yet.

Name Version Build Channel

auto-gptq 0.7.1 pypi_0 pypi

Name Version Build Channel

transformers 4.40.0 pypi_0 pypi

bozheng-hit commented 2 months ago

You can try codes in this PR: https://github.com/AutoGPTQ/AutoGPTQ/pull/593.

zxs-learn commented 3 weeks ago

You can try codes in this PR: AutoGPTQ/AutoGPTQ#593.

你好,我安装你提交的分支后对 Qwen2-57B-A14B-Instruct 这个模型进行量化,但是报了错误:

TypeError: "Qwen2MoeDecoderLayer." forward() got multiple values for argument 'attention_mask'

不知道哪里操作错误了, 能麻烦您指导一下么?谢谢。下面是详细的报错信息:

INFO - Quantizing mlp.experts.63.down_proj in layer 1/28... 2024-06-14 17:38:12 INFO [auto_gptq.modeling._base] Quantizing mlp.experts.63.down_proj in layer 1/28... 2024-06-14 17:38:13 INFO [auto_gptq.quantization.gptq] duration: 0.6368391513824463 2024-06-14 17:38:13 INFO [auto_gptq.quantization.gptq] avg loss: 0.0027140483260154726 INFO - Start quantizing layer 2/28 2024-06-14 17:38:14 INFO [auto_gptq.modeling._base] Start quantizing layer 2/28 Traceback (most recent call last): File "/data/xiaoshan/gptq/quant_script.py", line 125, in <module> model.quantize( File "/data/miniconda3/envs/qwen_gptq/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/data/xiaoshan/AutoGPTQ/auto_gptq/modeling/_base.py", line 453, in quantize layer(*layer_input, **additional_layer_inputs) File "/data/miniconda3/envs/qwen_gptq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/data/miniconda3/envs/qwen_gptq/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, **kwargs) TypeError: Qwen2MoeDecoderLayer.forward() got multiple values for argument 'attention_mask'