(ChatGLM2) PS F:\ChatGLM2-6B> python F:\ChatGLM2-6B\web_demo2.py
Failed to load cpm_kernels:[WinError 267] 目录名称无效。: 'C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\bin\Hostx86\x64\cl.exe'
Load parallel cpu kernel failed C:\Users\Msi-Baifa.cache\huggingface\modules\transformers_modules\chatglm2-6b-32k-int4\quantization_kernels_parallel.so: Traceback (most recent call last):
File "C:\Users\Msi-Baifa/.cache\huggingface\modules\transformers_modules\chatglm2-6b-32k-int4\quantization.py", line 148, in init
kernels = ctypes.cdll.LoadLibrary(kernel_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\miniconda3\envs\ChatGLM2\Lib\ctypes__init__.py", line 454, in LoadLibrary
return self._dlltype(name)
^^^^^^^^^^^^^^^^^^^
File "D:\miniconda3\envs\ChatGLM2\Lib\ctypes__init.py", line 376, in init__
self._handle = _dlopen(self._name, mode)
^^^^^^^^^^^^^^^^^^^^^^^^^
FileNotFoundError: Could not find module 'C:\Users\Msi-Baifa.cache\huggingface\modules\transformers_modules\chatglm2-6b-32k-int4\quantization_kernels_parallel.so' (or one of its dependencies). Try using the full path with constructor syntax.
Traceback (most recent call last):
File "F:\ChatGLM2-6B\test.py", line 7, in
response, history = model.chat(tokenizer, "你好", history=[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\miniconda3\envs\ChatGLM2\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Msi-Baifa/.cache\huggingface\modules\transformers_modules\chatglm2-6b-32k-int4\modeling_chatglm.py", line 1042, in chat
outputs = self.generate(inputs, gen_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\miniconda3\envs\ChatGLM2\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\miniconda3\envs\ChatGLM2\Lib\site-packages\transformers\generation\utils.py", line 1572, in generate
return self.sample(
^^^^^^^^^^^^
File "D:\miniconda3\envs\ChatGLM2\Lib\site-packages\transformers\generation\utils.py", line 2619, in sample
outputs = self(
^^^^^
File "D:\miniconda3\envs\ChatGLM2\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Msi-Baifa/.cache\huggingface\modules\transformers_modules\chatglm2-6b-32k-int4\modeling_chatglm.py", line 946, in forward
transformer_outputs = self.transformer(
^^^^^^^^^^^^^^^^^
File "D:\miniconda3\envs\ChatGLM2\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Msi-Baifa/.cache\huggingface\modules\transformers_modules\chatglm2-6b-32k-int4\modeling_chatglm.py", line 836, in forward
hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
^^^^^^^^^^^^^
File "D:\miniconda3\envs\ChatGLM2\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Msi-Baifa/.cache\huggingface\modules\transformers_modules\chatglm2-6b-32k-int4\modeling_chatglm.py", line 638, in forward
layer_ret = layer(
^^^^^^
File "D:\miniconda3\envs\ChatGLM2\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Msi-Baifa/.cache\huggingface\modules\transformers_modules\chatglm2-6b-32k-int4\modeling_chatglm.py", line 546, in forward
attention_output, kv_cache = self.self_attention(
^^^^^^^^^^^^^^^^^^^^
File "D:\miniconda3\envs\ChatGLM2\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Msi-Baifa/.cache\huggingface\modules\transformers_modules\chatglm2-6b-32k-int4\modeling_chatglm.py", line 375, in forward
mixed_x_layer = self.query_key_value(hidden_states)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\miniconda3\envs\ChatGLM2\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Msi-Baifa/.cache\huggingface\modules\transformers_modules\chatglm2-6b-32k-int4\quantization.py", line 502, in forward
output = W8A16Linear.apply(input, self.weight, self.weight_scale, self.weight_bit_width)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\miniconda3\envs\ChatGLM2\Lib\site-packages\torch\autograd\function.py", line 506, in apply
return super().apply(args, kwargs) # type: ignore[misc]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Msi-Baifa/.cache\huggingface\modules\transformers_modules\chatglm2-6b-32k-int4\quantization.py", line 75, in forward
weight = extract_weight_to_half(quant_w, scale_w, weight_bit_width)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Msi-Baifa/.cache\huggingface\modules\transformers_modules\chatglm2-6b-32k-int4\quantization.py", line 287, in extract_weight_to_half
func = kernels.int4WeightExtractionHalf
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'int4WeightExtractionHalf'
(ChatGLM2) PS F:\ChatGLM2-6B>
Is there an existing issue for this?
Current Behavior
windows,conda虚拟环境,报错
(ChatGLM2) PS F:\ChatGLM2-6B> python F:\ChatGLM2-6B\web_demo2.py Failed to load cpm_kernels:[WinError 267] 目录名称无效。: 'C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\bin\Hostx86\x64\cl.exe' Load parallel cpu kernel failed C:\Users\Msi-Baifa.cache\huggingface\modules\transformers_modules\chatglm2-6b-32k-int4\quantization_kernels_parallel.so: Traceback (most recent call last): File "C:\Users\Msi-Baifa/.cache\huggingface\modules\transformers_modules\chatglm2-6b-32k-int4\quantization.py", line 148, in init kernels = ctypes.cdll.LoadLibrary(kernel_file) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\miniconda3\envs\ChatGLM2\Lib\ctypes__init__.py", line 454, in LoadLibrary return self._dlltype(name) ^^^^^^^^^^^^^^^^^^^ File "D:\miniconda3\envs\ChatGLM2\Lib\ctypes__init.py", line 376, in init__ self._handle = _dlopen(self._name, mode) ^^^^^^^^^^^^^^^^^^^^^^^^^ FileNotFoundError: Could not find module 'C:\Users\Msi-Baifa.cache\huggingface\modules\transformers_modules\chatglm2-6b-32k-int4\quantization_kernels_parallel.so' (or one of its dependencies). Try using the full path with constructor syntax.
Traceback (most recent call last): File "F:\ChatGLM2-6B\test.py", line 7, in
response, history = model.chat(tokenizer, "你好", history=[])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\miniconda3\envs\ChatGLM2\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Msi-Baifa/.cache\huggingface\modules\transformers_modules\chatglm2-6b-32k-int4\modeling_chatglm.py", line 1042, in chat
outputs = self.generate(inputs, gen_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\miniconda3\envs\ChatGLM2\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "D:\miniconda3\envs\ChatGLM2\Lib\site-packages\transformers\generation\utils.py", line 1572, in generate
return self.sample(
^^^^^^^^^^^^
File "D:\miniconda3\envs\ChatGLM2\Lib\site-packages\transformers\generation\utils.py", line 2619, in sample
outputs = self(
^^^^^
File "D:\miniconda3\envs\ChatGLM2\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Msi-Baifa/.cache\huggingface\modules\transformers_modules\chatglm2-6b-32k-int4\modeling_chatglm.py", line 946, in forward
transformer_outputs = self.transformer(
^^^^^^^^^^^^^^^^^
File "D:\miniconda3\envs\ChatGLM2\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Msi-Baifa/.cache\huggingface\modules\transformers_modules\chatglm2-6b-32k-int4\modeling_chatglm.py", line 836, in forward
hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder(
^^^^^^^^^^^^^
File "D:\miniconda3\envs\ChatGLM2\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Msi-Baifa/.cache\huggingface\modules\transformers_modules\chatglm2-6b-32k-int4\modeling_chatglm.py", line 638, in forward
layer_ret = layer(
^^^^^^
File "D:\miniconda3\envs\ChatGLM2\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Msi-Baifa/.cache\huggingface\modules\transformers_modules\chatglm2-6b-32k-int4\modeling_chatglm.py", line 546, in forward
attention_output, kv_cache = self.self_attention(
^^^^^^^^^^^^^^^^^^^^
File "D:\miniconda3\envs\ChatGLM2\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Msi-Baifa/.cache\huggingface\modules\transformers_modules\chatglm2-6b-32k-int4\modeling_chatglm.py", line 375, in forward
mixed_x_layer = self.query_key_value(hidden_states)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\miniconda3\envs\ChatGLM2\Lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
return forward_call(*args, *kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Msi-Baifa/.cache\huggingface\modules\transformers_modules\chatglm2-6b-32k-int4\quantization.py", line 502, in forward
output = W8A16Linear.apply(input, self.weight, self.weight_scale, self.weight_bit_width)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\miniconda3\envs\ChatGLM2\Lib\site-packages\torch\autograd\function.py", line 506, in apply
return super().apply(args, kwargs) # type: ignore[misc]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Msi-Baifa/.cache\huggingface\modules\transformers_modules\chatglm2-6b-32k-int4\quantization.py", line 75, in forward
weight = extract_weight_to_half(quant_w, scale_w, weight_bit_width)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Msi-Baifa/.cache\huggingface\modules\transformers_modules\chatglm2-6b-32k-int4\quantization.py", line 287, in extract_weight_to_half
func = kernels.int4WeightExtractionHalf
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'int4WeightExtractionHalf'
(ChatGLM2) PS F:\ChatGLM2-6B>
Expected Behavior
无
Steps To Reproduce
![Uploading image.png…]() 模型是huggingface上下载的32k-int4
Environment
Anything else?
无