File "C:\Users\q.cache\huggingface\modules\transformers_modules\THUDM\chatglm3-6b\103caa40027ebfd8450289ca2f278eac4ff26405\quantization.py", line 126, in init
assert str(weight.device).startswith(
AssertionError: The weights that need to be quantified should be on the CUDA device
System Info / 系統信息
File "C:\Users\q.cache\huggingface\modules\transformers_modules\THUDM\chatglm3-6b\103caa40027ebfd8450289ca2f278eac4ff26405\quantization.py", line 126, in init assert str(weight.device).startswith( AssertionError: The weights that need to be quantified should be on the CUDA device
Python-3.11.5 torch-2.1.2+cu121 CUDA:0 (NVIDIA GeForce RTX 3060, 12287MiB) Setup complete ✅ (24 CPUs, 15.9 GB RAM, 369.6/465.8 GB disk)
OS Windows-10-10.0.19045-SP0 Environment Windows Python 3.11.5 Install pip RAM 15.89 GB CPU AMD Ryzen 9 5900X 12-Core Processor CUDA 12.1 torch ✅ 2.1.2+cu121>=1.8.0 torchvision ✅ 0.16.2+cu121>=0.9.0 transformers= 4.40.1
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
Reproduction / 复现过程
model = AutoModel.from_pretrained("THUDM/chatglm3-6b", trust_remote_code=True).quantize(4).cuda()
Expected behavior / 期待表现
I hope quantization version can run on rtx3060 with vram 12GB