QwenLM / Qwen2.5

Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
8.84k stars 551 forks source link

[Bug]:模型加载参数精度及显存占用不符合预期,无法用int4精度加载量化后的模型Qwen2.5-7B-Instruct-GPTQ-Int4 #1014

Open yanli789 opened 5 hours ago

yanli789 commented 5 hours ago

Model Series

Qwen2.5

What are the models used?

Qwen2.5-7B-Instruct-GPTQ-Int4

What is the scenario where the problem happened?

模型加载后,参数精度及显存占用不符合预期

Is this a known issue?

Information about environment

torch 2.1.2 transformers 4.41.2 tokenizers 0.19.1 auto-gptq 0.5.1 optimum 1.23.1 peft 0.13.1 CUDA Version: 12.0

Log output

from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "/data2/liyan/model/Qwen2.5-7B-Instruct-GPTQ-Int4"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="cuda:2",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True)
for name, param in model.named_parameters():
    print(f"Layer: {name}, Data type: {param.dtype}")

Description

本地加载Qwen2.5-7B-Instruct-GPTQ-Int4模型

from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "/data2/model/Qwen2.5-7B-Instruct-GPTQ-Int4" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="cuda:2", trust_remote_code=True ) tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True) for name, param in model.named_parameters(): print(f"Layer: {name}, Data type: {param.dtype}")

结果:模型未以预期精度及显存占用加载

模型未以预期的int4精度加载参数,而是以float16精度加载 显存占用未以预期的8G,而是17G。

问题:我的加载方式有问题吗?还是依赖包的问题

麻烦各位大佬帮忙指点,万分感谢

yanli789 commented 2 hours ago

我本地加载的模型是从这里下载的:https://modelscope.cn/models/Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4/, 会是模型的问题吗?

jklj077 commented 2 hours ago

try removing torch_dtype="auto"

yanli789 commented 2 hours ago

try removing torch_dtype="auto"

感谢回复,我删除了torch_dtype="auto"后加载模型,现在模型加载后占用显存为7G了,我又有一个问题请教哈: 我用以下方式打印模型各层参数: for name, param in model.named_parameters(): print(f"Layer: {name}, Data type: {param.dtype}") 各层返回的参数精度依然是float16,这说明模型是以float16精度加载的吗?但量化后精度不应该是int4吗? Layer: model.layers.21.post_attention_layernorm.weight, Data type: torch.float16