Open yanli789 opened 5 hours ago
我本地加载的模型是从这里下载的:https://modelscope.cn/models/Qwen/Qwen2.5-7B-Instruct-GPTQ-Int4/, 会是模型的问题吗?
try removing torch_dtype="auto"
try removing torch_dtype="auto"
感谢回复,我删除了torch_dtype="auto"后加载模型,现在模型加载后占用显存为7G了,我又有一个问题请教哈: 我用以下方式打印模型各层参数: for name, param in model.named_parameters(): print(f"Layer: {name}, Data type: {param.dtype}") 各层返回的参数精度依然是float16,这说明模型是以float16精度加载的吗?但量化后精度不应该是int4吗? Layer: model.layers.21.post_attention_layernorm.weight, Data type: torch.float16
Model Series
Qwen2.5
What are the models used?
Qwen2.5-7B-Instruct-GPTQ-Int4
What is the scenario where the problem happened?
模型加载后,参数精度及显存占用不符合预期
Is this a known issue?
Information about environment
torch 2.1.2 transformers 4.41.2 tokenizers 0.19.1 auto-gptq 0.5.1 optimum 1.23.1 peft 0.13.1 CUDA Version: 12.0
Log output
Description
本地加载Qwen2.5-7B-Instruct-GPTQ-Int4模型
from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "/data2/model/Qwen2.5-7B-Instruct-GPTQ-Int4" model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype="auto", device_map="cuda:2", trust_remote_code=True ) tokenizer = AutoTokenizer.from_pretrained(model_name,trust_remote_code=True) for name, param in model.named_parameters(): print(f"Layer: {name}, Data type: {param.dtype}")
结果:模型未以预期精度及显存占用加载
模型未以预期的int4精度加载参数,而是以float16精度加载 显存占用未以预期的8G,而是17G。
问题:我的加载方式有问题吗?还是依赖包的问题
麻烦各位大佬帮忙指点,万分感谢