quantization failed - Githubissues

System Info / 系統信息

File "C:\Users\q.cache\huggingface\modules\transformers_modules\THUDM\chatglm3-6b\103caa40027ebfd8450289ca2f278eac4ff26405\quantization.py", line 126, in init assert str(weight.device).startswith( AssertionError: The weights that need to be quantified should be on the CUDA device

Python-3.11.5 torch-2.1.2+cu121 CUDA:0 (NVIDIA GeForce RTX 3060, 12287MiB) Setup complete ✅ (24 CPUs, 15.9 GB RAM, 369.6/465.8 GB disk)

OS Windows-10-10.0.19045-SP0 Environment Windows Python 3.11.5 Install pip RAM 15.89 GB CPU AMD Ryzen 9 5900X 12-Core Processor CUDA 12.1 torch ✅ 2.1.2+cu121>=1.8.0 torchvision ✅ 0.16.2+cu121>=0.9.0 transformers= 4.40.1

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

[X] The official example scripts / 官方的示例脚本
[ ] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).quantize(4).cuda()

Expected behavior / 期待表现

I hope quantization version can run on rtx3060 with vram 12GB

THUDM / ChatGLM3

quantization failed #1197

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现