使用量化方式加载模型时,提示
Failed to load cpm_kernels:[WinError 267] 目录名称无效。: 'C:\\Users\\Hengj\\AppData\\Local\\Programs\\Python\\Python310\\python.exe'
使用Gradio启动时的完整输出:
Loading checkpoint shards: 0%| | 0/7 [00:00<?, _?it/s]C:\Users\Hengj\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\_utils.py:831:_ UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 7/7 [00:08<00:00, 1.17s/it]
Failed to load cpm_kernels:[WinError 267] 目录名称无效。: 'C:\\Users\\Hengj\\AppData\\Local\\Programs\\Python\\Python310\\python.exe'
Traceback (most recent call last):
File "E:\ChatGLM3\basic_demo\web_demo_gradio.py", line 29, in <module>
model = AutoModel.from_pretrained("E:\ChatGLM3", trust_remote_code=True, device_map="auto").quantize(4).cuda()
File "C:\Users\Hengj\.cache\huggingface\modules\transformers_modules\ChatGLM3\modeling_chatglm.py", line 1208, in quantize
self.transformer.encoder = quantize(self.transformer.encoder, bits, empty_init=empty_init, device=device,
File "C:\Users\Hengj\.cache\huggingface\modules\transformers_modules\ChatGLM3\quantization.py", line 155, in quantize
layer.self_attention.query_key_value = QuantizedLinear(
File "C:\Users\Hengj\.cache\huggingface\modules\transformers_modules\ChatGLM3\quantization.py", line 139, in __init__
self.weight = compress_int4_weight(self.weight)
File "C:\Users\Hengj\.cache\huggingface\modules\transformers_modules\ChatGLM3\quantization.py", line 76, in compress_int4_weight
blockDim = (min(round_up(m, 32), 1024), 1, 1)
NameError: name 'round_up' is not defined
使用streamlit启动时的完整输出:
Loading checkpoint shards: 0%| | 0/7 [00:00<?, ?it/s]C:\Users\Hengj\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
return self.fget.__get__(instance, owner)()
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 7/7 [00:04<00:00, 1.55it/s]
Failed to load cpm_kernels:[WinError 267] 目录名称无效。: 'C:\\Users\\Hengj\\AppData\\Local\\Programs\\Python\\Python310\\python.exe'
2024-02-02 00:41:57.327 Uncaught app exception
Traceback (most recent call last):
File "C:\Users\Hengj\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\runtime\caching\cache_utils.py", line 264, in _get_or_create_cached_value
cached_result = cache.read_result(value_key)
File "C:\Users\Hengj\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\runtime\caching\cache_resource_api.py", line 498, in read_result
raise CacheKeyNotFoundError()
streamlit.runtime.caching.cache_errors.CacheKeyNotFoundError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Hengj\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\runtime\caching\cache_utils.py", line 312, in _handle_cache_miss
cached_result = cache.read_result(value_key)
File "C:\Users\Hengj\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\runtime\caching\cache_resource_api.py", line 498, in read_result
raise CacheKeyNotFoundError()
streamlit.runtime.caching.cache_errors.CacheKeyNotFoundError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\Hengj\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 535, in _run_script
exec(code, module.__dict__)
File "E:\ChatGLM3\basic_demo\web_demo_streamlit.py", line 37, in <module>
tokenizer, model = get_model()
File "C:\Users\Hengj\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\runtime\caching\cache_utils.py", line 212, in wrapper
return cached_func(*args, **kwargs)
File "C:\Users\Hengj\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\runtime\caching\cache_utils.py", line 241, in __call__
return self._get_or_create_cached_value(args, kwargs)
File "C:\Users\Hengj\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\runtime\caching\cache_utils.py", line 267, in _get_or_create_cached_value
return self._handle_cache_miss(cache, value_key, func_args, func_kwargs)
File "C:\Users\Hengj\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\runtime\caching\cache_utils.py", line 321, in _handle_cache_miss
computed_value = self._info.func(*func_args, **func_kwargs)
File "E:\ChatGLM3\basic_demo\web_demo_streamlit.py", line 32, in get_model
model = AutoModel.from_pretrained("E:\ChatGLM3", trust_remote_code=True, device_map="cuda").quantize(4).cuda()
File "C:\Users\Hengj\.cache\huggingface\modules\transformers_modules\ChatGLM3\modeling_chatglm.py", line 1208, in quantize
self.transformer.encoder = quantize(self.transformer.encoder, bits, empty_init=empty_init, device=device,
File "C:\Users\Hengj\.cache\huggingface\modules\transformers_modules\ChatGLM3\quantization.py", line 155, in quantize
layer.self_attention.query_key_value = QuantizedLinear(
File "C:\Users\Hengj\.cache\huggingface\modules\transformers_modules\ChatGLM3\quantization.py", line 139, in __init__
self.weight = compress_int4_weight(self.weight)
File "C:\Users\Hengj\.cache\huggingface\modules\transformers_modules\ChatGLM3\quantization.py", line 76, in compress_int4_weight
blockDim = (min(round_up(m, 32), 1024), 1, 1)
NameError: name 'round_up' is not defined
Expected Behavior
目录不应指向具体文件
Steps To Reproduce
1.完整安装依赖并下载好模型,不使用量化方式,可正常运行
2.修改web_demo_gradio.py的第29行为
model = AutoModel.from_pretrained("E:\ChatGLM3", trust_remote_code=True, device_map="auto").quantize(4).cuda(),或
修改web_demo_streamlit.py的第32行为
model = AutoModel.from_pretrained("E:\ChatGLM3", trust_remote_code=True, device_map="cuda").quantize(4).cuda()
3.在basic_demo目录下运行python web_demo_gradio.py或streamlit run web_demo_streamlit.py
Is there an existing issue for this?
Current Behavior
使用量化方式加载模型时,提示
Failed to load cpm_kernels:[WinError 267] 目录名称无效。: 'C:\\Users\\Hengj\\AppData\\Local\\Programs\\Python\\Python310\\python.exe'
使用Gradio启动时的完整输出:
使用streamlit启动时的完整输出:
Expected Behavior
目录不应指向具体文件
Steps To Reproduce
1.完整安装依赖并下载好模型,不使用量化方式,可正常运行
2.修改
web_demo_gradio.py
的第29行为model = AutoModel.from_pretrained("E:\ChatGLM3", trust_remote_code=True, device_map="auto").quantize(4).cuda()
,或 修改web_demo_streamlit.py
的第32行为model = AutoModel.from_pretrained("E:\ChatGLM3", trust_remote_code=True, device_map="cuda").quantize(4).cuda()
3.在
basic_demo
目录下运行python web_demo_gradio.py
或streamlit run web_demo_streamlit.py
4.Loading checkpoint shards完成后控制台出现所描述错误信息
Environment
Anything else?
No response