[Bug]:模型加载参数精度及显存占用不符合预期，无法用int4精度加载量化后的模型Qwen2.5-7B-Instruct-GPTQ-Int4

yanli789 commented 5 hours ago

Model Series

Qwen2.5

What are the models used?

Qwen2.5-7B-Instruct-GPTQ-Int4

What is the scenario where the problem happened?

模型加载后，参数精度及显存占用不符合预期

Is this a known issue?

[X] I have followed the GitHub README.
[X] I have checked the Qwen documentation and cannot find an answer there.
[X] I have checked the documentation of the related framework and cannot find useful information.
[X] I have searched the issues and there is not a similar one.

Information about environment

torch 2.1.2 transformers 4.41.2 tokenizers 0.19.1 auto-gptq 0.5.1 optimum 1.23.1 peft 0.13.1 CUDA Version: 12.0

Log output

from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "/data2/liyan/model/Qwen2.5-7B-Instruct-GPTQ-Int4"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="cuda:2",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True)
for name, param in model.named_parameters():
    print(f"Layer: {name}, Data type: {param.dtype}")

Description

本地加载Qwen2.5-7B-Instruct-GPTQ-Int4模型

from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "/data2/model/Qwen2.5-7B-Instruct-GPTQ-Int4" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="cuda:2", trust_remote_code=True ) tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True) for name, param in model.named_parameters(): print(f"Layer: {name}, Data type: {param.dtype}")

结果：模型未以预期精度及显存占用加载

模型未以预期的int4精度加载参数，而是以float16精度加载显存占用未以预期的8G，而是17G。

问题：我的加载方式有问题吗？还是依赖包的问题

麻烦各位大佬帮忙指点，万分感谢

yanli789 commented 2 hours ago

我本地加载的模型是从这里下载的：https://modelscope.cn/models/Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4/，会是模型的问题吗？

jklj077 commented 2 hours ago

try removing torch_dtype="auto"

yanli789 commented 2 hours ago

try removing torch_dtype="auto"

感谢回复，我删除了torch_dtype="auto"后加载模型，现在模型加载后占用显存为7G了，我又有一个问题请教哈：我用以下方式打印模型各层参数： for name, param in model.named_parameters(): print(f"Layer: {name}, Data type: {param.dtype}") 各层返回的参数精度依然是float16，这说明模型是以float16精度加载的吗？但量化后精度不应该是int4吗？ Layer: model.layers.21.post_attention_layernorm.weight, Data type: torch.float16

QwenLM / Qwen2.5