ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型
15.65k
stars
1.85k
forks
source link
[BUG/Help] 用vllm 起INT4量化版本的模型报错 类型不匹配 self_attention.dense.weight int4 shape [4096,2048] mismatch fp16 shape [4096, 4096] #680
Open
yjjiang11 opened 1 month ago
Is there an existing issue for this?
Current Behavior
self_attention.dense.weight int4 shape [4096,2048] mismatch fp16 shape [4096, 4096] which results failling to setup vllm server
Expected Behavior
chatglm2-6b-int4 can be deployed with vllm
Steps To Reproduce
none
Environment
Anything else?
s