D:\anaconda\envs\ChatGLM-6B\python.exe D:\ChatGLM-6B-main\web_demo.py
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
No compiled kernel found.
Compiling kernels : C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels_parallel.c
Compiling gcc -O3 -pthread -fopenmp -std=c99 C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels_parallel.c -shared -o C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels_parallel.so
'gcc' �����ڲ����ⲿ���Ҳ���ǿ����еij���
���������ļ���
Compile failed, using default cpu kernel code.
Compiling gcc -O3 -fPIC -std=c99 C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels.c -shared -o C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels.so
Kernels compiled : C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels.so
Cannot load cpu kernel, don't use quantized model on cpu.
Using quantization cache
Applying quantization to glm layers
No compiled kernel found.
Compiling kernels : C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels_parallel.c
Compiling gcc -O3 -pthread -fopenmp -std=c99 C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels_parallel.c -shared -o C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels_parallel.so
'gcc' �����ڲ����ⲿ���Ҳ���ǿ����еij���
���������ļ���
Traceback (most recent call last):
File "D:\ChatGLM-6B-main\web_demo.py", line 6, in
model = AutoModel.from_pretrained("model", trust_remote_code=True).half().quantize(4).cuda()
File "C:\Users\73488/.cache\huggingface\modules\transformers_modules\model\modeling_chatglm.py", line 1267, in quantize
load_cpu_kernel(kwargs)
File "C:\Users\73488/.cache\huggingface\modules\transformers_modules\model\quantization.py", line 386, in load_cpu_kernel
cpu_kernels = CPUKernel(kwargs)
File "C:\Users\73488/.cache\huggingface\modules\transformers_modules\model\quantization.py", line 137, in init
kernels = ctypes.cdll.LoadLibrary(kernel_file)
File "D:\anaconda\envs\ChatGLM-6B\lib\ctypes__init__.py", line 452, in LoadLibrary
Compile failed, using default cpu kernel code.
Compiling gcc -O3 -fPIC -std=c99 C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels.c -shared -o C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels.so
Kernels compiled : C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels.so
return self._dlltype(name)
File "D:\anaconda\envs\ChatGLM-6B\lib\ctypes__init.py", line 374, in init__
self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels.so' (or one of its dependencies). Try using the full path with constructor syntax.
Expected Behavior
求大佬看看是什么问题,给个解决方案
Steps To Reproduce
无
Environment
- OS:windows
- Python:3.9
- Transformers:4.27.1
- PyTorch:1.12.1
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :True
Is there an existing issue for this?
Current Behavior
D:\anaconda\envs\ChatGLM-6B\python.exe D:\ChatGLM-6B-main\web_demo.py Explicitly passing a
model = AutoModel.from_pretrained("model", trust_remote_code=True).half().quantize(4).cuda()
File "C:\Users\73488/.cache\huggingface\modules\transformers_modules\model\modeling_chatglm.py", line 1267, in quantize
load_cpu_kernel(kwargs)
File "C:\Users\73488/.cache\huggingface\modules\transformers_modules\model\quantization.py", line 386, in load_cpu_kernel
cpu_kernels = CPUKernel(kwargs)
File "C:\Users\73488/.cache\huggingface\modules\transformers_modules\model\quantization.py", line 137, in init
kernels = ctypes.cdll.LoadLibrary(kernel_file)
File "D:\anaconda\envs\ChatGLM-6B\lib\ctypes__init__.py", line 452, in LoadLibrary
Compile failed, using default cpu kernel code.
Compiling gcc -O3 -fPIC -std=c99 C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels.c -shared -o C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels.so
Kernels compiled : C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels.so
return self._dlltype(name)
File "D:\anaconda\envs\ChatGLM-6B\lib\ctypes__init.py", line 374, in init__
self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels.so' (or one of its dependencies). Try using the full path with constructor syntax.
revision
is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. Explicitly passing arevision
is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision. Explicitly passing arevision
is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. No compiled kernel found. Compiling kernels : C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels_parallel.c Compiling gcc -O3 -pthread -fopenmp -std=c99 C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels_parallel.c -shared -o C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels_parallel.so 'gcc' �����ڲ����ⲿ���Ҳ���ǿ����еij��� ���������ļ��� Compile failed, using default cpu kernel code. Compiling gcc -O3 -fPIC -std=c99 C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels.c -shared -o C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels.so Kernels compiled : C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels.so Cannot load cpu kernel, don't use quantized model on cpu. Using quantization cache Applying quantization to glm layers No compiled kernel found. Compiling kernels : C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels_parallel.c Compiling gcc -O3 -pthread -fopenmp -std=c99 C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels_parallel.c -shared -o C:\Users\73488.cache\huggingface\modules\transformers_modules\model\quantization_kernels_parallel.so 'gcc' �����ڲ����ⲿ���Ҳ���ǿ����еij��� ���������ļ��� Traceback (most recent call last): File "D:\ChatGLM-6B-main\web_demo.py", line 6, inExpected Behavior
求大佬看看是什么问题,给个解决方案
Steps To Reproduce
无
Environment
Anything else?
No response