Failed to load cpm_kernels:name 'CPUKernel' is not defined
欢迎使用 ChatGLM2-6B 模型,输入内容即可进行对话,clear 清空对话历史,stop 终止程序
用户:a
ChatGLM:Traceback (most recent call last):
File "E:\pycharm\ChatGLM2-6B\cli_demo.py", line 62, in <module>
main()
File "E:\pycharm\ChatGLM2-6B\cli_demo.py", line 49, in main
for response, history, past_key_values in model.stream_chat(tokenizer, query, history=history,
File "E:\pycharm\ChatGLM2-6B\jieshiqi\lib\site-packages\torch\utils\_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "C:\Users\14363/.cache\huggingface\modules\transformers_modules\chatglm2-6b-int4\modeling_chatglm.py", line 1058, in stream_chat
for outputs in self.stream_generate(**inputs, past_key_values=past_key_values,
File "E:\pycharm\ChatGLM2-6B\jieshiqi\lib\site-packages\torch\utils\_contextlib.py", line 35, in generator_context
response = gen.send(None)
File "C:\Users\14363/.cache\huggingface\modules\transformers_modules\chatglm2-6b-int4\modeling_chatglm.py", line 1143, in stream_generate
outputs = self(
File "E:\pycharm\ChatGLM2-6B\jieshiqi\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\14363/.cache\huggingface\modules\transformers_modules\chatglm2-6b-int4\modeling_chatglm.py", line 932, in forward
transformer_outputs = self.transformer(
File "E:\pycharm\ChatGLM2-6B\jieshiqi\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\14363/.cache\huggingface\modules\transformers_modules\chatglm2-6b-int4\modeling_chatglm.py", line 828, in forward
hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
File "E:\pycharm\ChatGLM2-6B\jieshiqi\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\14363/.cache\huggingface\modules\transformers_modules\chatglm2-6b-int4\modeling_chatglm.py", line 638, in forward
layer_ret = layer(
File "E:\pycharm\ChatGLM2-6B\jieshiqi\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\14363/.cache\huggingface\modules\transformers_modules\chatglm2-6b-int4\modeling_chatglm.py", line 542, in forward
attention_output, kv_cache = self.self_attention(
File "E:\pycharm\ChatGLM2-6B\jieshiqi\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\14363/.cache\huggingface\modules\transformers_modules\chatglm2-6b-int4\modeling_chatglm.py", line 374, in forward
mixed_x_layer = self.query_key_value(hidden_states)
File "E:\pycharm\ChatGLM2-6B\jieshiqi\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\14363/.cache\huggingface\modules\transformers_modules\chatglm2-6b-int4\quantization.py", line 502, in forward
output = W8A16Linear.apply(input, self.weight, self.weight_scale, self.weight_bit_width)
File "E:\pycharm\ChatGLM2-6B\jieshiqi\lib\site-packages\torch\autograd\function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "C:\Users\14363/.cache\huggingface\modules\transformers_modules\chatglm2-6b-int4\quantization.py", line 75, in forward
weight = extract_weight_to_half(quant_w, scale_w, weight_bit_width)
File "C:\Users\14363/.cache\huggingface\modules\transformers_modules\chatglm2-6b-int4\quantization.py", line 287, in extract_weight_to_half
func = kernels.int4WeightExtractionHalf
AttributeError: 'NoneType' object has no attribute 'int4WeightExtractionHalf'
进程已结束,退出代码为 1
Is there an existing issue for this?
Current Behavior
运行web_demo.py 时报错 参数: tokenizer = AutoTokenizer.from_pretrained("E:\pycharm\ChatGLM2-6B\model\chatglm2-6b-int4", trust_remote_code=True) model = AutoModel.from_pretrained("E:\pycharm\ChatGLM2-6B\model\chatglm2-6b-int4", trust_remote_code=True).half().cuda() model = model.quantize(bits=4, kernel_file="E:\pycharm\ChatGLM2-6B\model\chatglm2-6b-int4\quantization_kernels.so")
quantization_kernels.so为手动编译,参考为https://github.com/THUDM/ChatGLM-6B/issues/166
前代的chatglm-6b-int4在量化时似乎也有这样的错误,故参考了一下:
https://github.com/THUDM/ChatGLM-6B/issues/214
https://github.com/THUDM/ChatGLM-6B/issues/162
Expected Behavior
正常运行量化模型(虽然未量化模型勉强能用,但生成效率感人)
Steps To Reproduce
详细报错已经在上面了 怀疑是对quantization进行操作时导致错误 更改方式类似于这个:https://github.com/THUDM/ChatGLM-6B/issues/166#issuecomment-1484705952
Environment
Anything else?
No response