THUDM / ChatGLM3

ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型
Apache License 2.0
13.39k stars 1.55k forks source link

GPU Memory 7G, HeaderTooLarge报错 #1154

Closed yanli789 closed 6 months ago

yanli789 commented 6 months ago

System Info / 系統信息

环境:linux,GPU Memory 7G,CUDA Version: 11.7,python3.10,transformers-4.30.2,torch-2.0.1

Who can help? / 谁可以帮助到您?

No response

Information / 问题信息

Reproduction / 复现过程

运行代码: model = AutoModel.from_pretrained("/mnt/chatglm3-6b", trust_remote_code=True).half().quantize(4).cuda() 报错信息:SafetensorError: Error while deserializing header: HeaderTooLarge

Expected behavior / 期待表现

烦请大佬帮忙看看,深表感谢。

yanli789 commented 6 months ago

使用model = AutoModel.from_pretrained("/mnt/chatglm3-6b", trust_remote_code=True).quantize(bits=4, device="cuda").cuda() 也是同样的报错信息

qingdengyue commented 6 months ago
  1. /mnt/chatglm3-6b 替换为 THUDM/chatglm3-6b 看是否正常呢?
  2. /mnt/chatglm3-6b 是微调后的模型?还是直接下载的?
yanli789 commented 6 months ago

/mnt/chatglm3-6b 是直接下载的,是不是这个模型默认启动加载内存就会很大,但我不知道如何调小默认内存,我只是想简单启动一个对话demo先学习下。

zRzRzRzRzRzRzR commented 6 months ago

是的,默认启动就是很大的,但是应该是能正常量化的,你这个内存多少

yanli789 commented 6 months ago

我的的liunx环境,内存128G,显存7G

zRzRzRzRzRzRzR commented 6 months ago

emm,你这是啥卡,是7G,我没有遇到过类似的问题,暂时没办法复现

zRzRzRzRzRzRzR commented 6 months ago

model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).quantize(bits=4, device="cuda").cuda()

yanli789 commented 6 months ago

对不起,我刚才写的不正确,我的是: NVIDIA-SMI 515.43.04 Driver Version: 515.105.01 CUDA Version: 11.7 ,Memory: 8192MiB

yanli789 commented 6 months ago

model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).quantize(bits=4, device="cuda").cuda()

这个尝试了,还是会报错:SafetensorError: Error while deserializing header: HeaderTooLarge

yanli789 commented 6 months ago

报错截图0

yanli789 commented 6 months ago

chatglm3-6b是直接下载的源码,没做任何改动,我尝试过以下代码,均报错SafetensorError: Error while deserializing header: HeaderTooLarge model = AutoModel.from_pretrained("/mnt/chatglm3-6b", trust_remote_code=True).quantize(bits=4, device="cuda").cuda() model = AutoModel.from_pretrained("/mnt/chatglm3-6b", trust_remote_code=True).half().quantize(bits=4, device="cuda").eval() model = AutoModel.from_pretrained("/mnt/chatglm3-6b", trust_remote_code=True).half().cuda().quantize(4).eval() model = AutoModel.from_pretrained("/mnt/chatglm3-6b",max_length=128, num_labels=64, trust_remote_code=True).half().cuda().quantize(4).eval() model = AutoModel.from_pretrained("/mnt/chatglm3-6b",max_length=128, num_labels=64, trust_remote_code=True).cuda()

qingdengyue commented 6 months ago

可能8G显存是不够的 ❓

zRzRzRzRzRzRzR commented 6 months ago

最新的github代码和hf代码吗 model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).quantize(bits=4, device="cuda").cuda() 理论是这样就行了

yanli789 commented 6 months ago

最新的github代码和hf代码吗 model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).quantize(bits=4, device="cuda").cuda() 理论是这样就行了

那我再更新代码试试看,感谢大佬回复。

yanli789 commented 6 months ago

重拉代码后问题已解决,再次感谢帮忙