THUDM / ChatGLM3

ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型
Apache License 2.0
13.39k stars 1.55k forks source link

quantization failed #1197

Closed qslia closed 5 months ago

qslia commented 5 months ago

System Info / 系統信息

File "C:\Users\q.cache\huggingface\modules\transformers_modules\THUDM\chatglm3-6b\103caa40027ebfd8450289ca2f278eac4ff26405\quantization.py", line 126, in init assert str(weight.device).startswith( AssertionError: The weights that need to be quantified should be on the CUDA device

Python-3.11.5 torch-2.1.2+cu121 CUDA:0 (NVIDIA GeForce RTX 3060, 12287MiB) Setup complete ✅ (24 CPUs, 15.9 GB RAM, 369.6/465.8 GB disk)

OS Windows-10-10.0.19045-SP0 Environment Windows Python 3.11.5 Install pip RAM 15.89 GB CPU AMD Ryzen 9 5900X 12-Core Processor CUDA 12.1 torch ✅ 2.1.2+cu121>=1.8.0 torchvision ✅ 0.16.2+cu121>=0.9.0 transformers= 4.40.1

Who can help? / 谁可以帮助到您?

No response

Information / 问题信息

Reproduction / 复现过程

model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).quantize(4).cuda()

Expected behavior / 期待表现

I hope quantization version can run on rtx3060 with vram 12GB

zRzRzRzRzRzRzR commented 5 months ago

请更新到最新代码