微调后执行autoawq量化出错

see2023 commented 8 months ago

微调参考的是 https://qwen.readthedocs.io/zh-cn/latest/training/SFT/example.html 使用jsonl格式。

Qwen1.5/examples/sft/finetune.sh --use_lora True --deepspeed ds_config_zero2.json

量化参考的是https://qwen.readthedocs.io/zh-cn/latest/quantization/awq.html

量化时AutoAWQForCausalLM的model加载提示没有 config.json，用AutoModelForCausalLM的config保存：

config = model.config
config.save_pretrained(model_path)

再执行:

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMV" }

tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoAWQForCausalLM.from_pretrained(model_path, device_map="auto", safetensors=True)

#load data 
import json
data = []
jsonl_file =  'fine_tune.jsonl'
with open(jsonl_file, 'r') as f:
    messages = [json.loads(line) for line in f]
for msg in messages:
    msgs = msg['messages']
    text = tokenizer.apply_chat_template(msgs, tokenize=False, add_generation_prompt=False)
    data.append(text.strip())

model.quantize(tokenizer, quant_config=quant_config, calib_data=data)

执行到model.quantize()时在这里错误： /usr/local/lib/python3.10/dist-packages/awq/models/qwen2.py

    @staticmethod
    def get_layers_for_scaling(module: OldQwen2DecoderLayer, input_feat, module_kwargs):
        layers = []

        # attention input
        layers.append(
            dict(
                prev_op=module.input_layernorm,
                layers=[
                    module.self_attn.q_proj,
                    module.self_attn.k_proj,
                    module.self_attn.v_proj,
                ],
                inp=input_feat["self_attn.q_proj"],
                module2inspect=module.self_attn,
                kwargs=module_kwargs,
            )
        )

KeyError: 'self_attn.q_proj' 请问这是什么原因，或者说微调后量化应该是怎样的流程呢？感谢！

see2023 commented 8 months ago

debug awq/models/qwen2.py看到没有input_feat["self_attn.q_proj"]，替换成了 input_feat["self_attn.q_proj.base_layer"] 继续跑，仍然错误：

[/usr/local/lib/python3.10/dist-packages/awq/quantize/scale.py](https://localhost:8080/#) in apply_scale(module, scales_list, input_feat_dict)
     66 
     67         else:
---> 68             raise NotImplementedError(f"prev_op {type(prev_op)} not supported yet!")
     69 
     70         # apply the scaling to input feat if given; prepare it for clipping

NotImplementedError: prev_op <class 'peft.tuners.lora.layer.Linear'> not supported yet!

jklj077 commented 6 months ago

Hi, to quantize the LoRA finetuned models, you need to merge the adapters first. Please refer to the peft documentation for guidance on that:

https://huggingface.co/docs/peft/developer_guides/lora#merge-adapters

QwenLM / Qwen2.5

微调后执行autoawq量化出错 #180