THUDM / ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Apache License 2.0
40.62k stars 5.21k forks source link

[BUG/Help] <Help Need for quantize the model> #161

Closed oddwatcher closed 1 year ago

oddwatcher commented 1 year ago

Is there an existing issue for this?

Current Behavior

I want to run this on a R7-6800H laptop using CPU however I do not have a NVIDIA card to run the quantize python code. can someone provide me a int8 or 16 version of quantized model or give me some instructions on how to write one to let CPU to do the work? P.S Should the model works in LLaMa.cpp?

Expected Behavior

No response

Steps To Reproduce

have a laptop with out NVIDIA GPU and run the quantize python code.

Environment

- OS:windows 11  
- Python:3.10.9
- Transformers:according to requirements.txt
- PyTorch:requirements.txt
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :False

Anything else?

No response

songxxzp commented 1 year ago

This might be what you are looking for: https://huggingface.co/THUDM/chatglm-6b-int4 Load it on CPU: model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4",trust_remote_code=True).float() 8G RAM is required.