GPU Memory 7G, HeaderTooLarge报错

THUDM / ChatGLM3

ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型

Apache License 2.0

13.39k stars 1.55k forks source link

GPU Memory 7G, HeaderTooLarge报错 #1154

Closed yanli789 closed 6 months ago

yanli789 commented 6 months ago

System Info / 系統信息

环境：linux，GPU Memory 7G，CUDA Version: 11.7，python3.10，transformers-4.30.2，torch-2.0.1

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

[X] The official example scripts / 官方的示例脚本
[ ] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

运行代码： model = AutoModel.from_pretrained("/mnt/chatglm3-6b", trust_remote_code=True).half().quantize(4).cuda() 报错信息：SafetensorError: Error while deserializing header: HeaderTooLarge

Expected behavior / 期待表现

烦请大佬帮忙看看，深表感谢。

yanli789 commented 6 months ago

使用model = AutoModel.from_pretrained("/mnt/chatglm3-6b", trust_remote_code=True).quantize(bits=4, device="cuda").cuda() 也是同样的报错信息

qingdengyue commented 6 months ago

/mnt/chatglm3-6b 替换为 THUDM/chatglm3-6b 看是否正常呢?
/mnt/chatglm3-6b 是微调后的模型？还是直接下载的？

yanli789 commented 6 months ago

/mnt/chatglm3-6b 是直接下载的，是不是这个模型默认启动加载内存就会很大，但我不知道如何调小默认内存，我只是想简单启动一个对话demo先学习下。

zRzRzRzRzRzRzR commented 6 months ago

是的，默认启动就是很大的，但是应该是能正常量化的，你这个内存多少

yanli789 commented 6 months ago

我的的liunx环境，内存128G，显存7G

zRzRzRzRzRzRzR commented 6 months ago

emm，你这是啥卡，是7G，我没有遇到过类似的问题，暂时没办法复现

zRzRzRzRzRzRzR commented 6 months ago

model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).quantize(bits=4, device="cuda").cuda()

yanli789 commented 6 months ago

对不起，我刚才写的不正确，我的是： NVIDIA-SMI 515.43.04 Driver Version: 515.105.01 CUDA Version: 11.7 ，Memory： 8192MiB

yanli789 commented 6 months ago

model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).quantize(bits=4, device="cuda").cuda()

这个尝试了，还是会报错：SafetensorError: Error while deserializing header: HeaderTooLarge

yanli789 commented 6 months ago

报错截图0

yanli789 commented 6 months ago

chatglm3-6b是直接下载的源码，没做任何改动，我尝试过以下代码，均报错SafetensorError: Error while deserializing header: HeaderTooLarge model = AutoModel.from_pretrained("/mnt/chatglm3-6b", trust_remote_code=True).quantize(bits=4, device="cuda").cuda() model = AutoModel.from_pretrained("/mnt/chatglm3-6b", trust_remote_code=True).half().quantize(bits=4, device="cuda").eval() model = AutoModel.from_pretrained("/mnt/chatglm3-6b", trust_remote_code=True).half().cuda().quantize(4).eval() model = AutoModel.from_pretrained("/mnt/chatglm3-6b",max_length=128, num_labels=64, trust_remote_code=True).half().cuda().quantize(4).eval() model = AutoModel.from_pretrained("/mnt/chatglm3-6b",max_length=128, num_labels=64, trust_remote_code=True).cuda()

qingdengyue commented 6 months ago

可能8G显存是不够的 ❓

zRzRzRzRzRzRzR commented 6 months ago

最新的github代码和hf代码吗 model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).quantize(bits=4, device="cuda").cuda() 理论是这样就行了

yanli789 commented 6 months ago

最新的github代码和hf代码吗 model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).quantize(bits=4, device="cuda").cuda() 理论是这样就行了

那我再更新代码试试看，感谢大佬回复。

yanli789 commented 6 months ago

重拉代码后问题已解决，再次感谢帮忙