[BUG/Help] 用vllm 起INT4量化版本的模型报错类型不匹配 self_attention.dense.weight int4 shape [4096,2048] mismatch fp16 shape [4096, 4096]

THUDM / ChatGLM2-6B

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型

Other

15.65k stars 1.85k forks source link

Open yjjiang11 opened 1 month ago

yjjiang11 commented 1 month ago

self_attention.dense.weight int4 shape [4096,2048] mismatch fp16 shape [4096, 4096] which results failling to setup vllm server

chatglm2-6b-int4 can be deployed with vllm

none

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :