C:\Users\linh\Desktop\chatglm-voice>python chat.py
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision.
Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision.
No compiled kernel found.
Compiling kernels : C:\Users\linh.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\linh.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.c -shared -o C:\Users\linh.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.so
Kernels compiled : C:\Users\linh.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.so
Cannot load cpu kernel, don't use quantized model on cpu.
Using quantization cache
Applying quantization to glm layers
INFO:transformers_modules.local.modeling_chatglm:Already quantized, reloading cpu kernel.
No compiled kernel found.
Compiling kernels : C:\Users\linh.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.c
Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\linh.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.c -shared -o C:\Users\linh.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.so
Kernels compiled : C:\Users\linh.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.so
Traceback (most recent call last):
File "C:\Users\linh\Desktop\chatglm-voice\chat.py", line 93, in
model = AutoModel.from_pretrained(args.ChatGLM, trust_remote_code=True).half().quantize(4).cuda()
File "C:\Users\linh/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 1397, in quantize load_cpu_kernel(kwargs)
File "C:\Users\linh/.cache\huggingface\modules\transformers_modules\local\quantization.py", line 394, in load_cpu_kernel
cpu_kernels = CPUKernel(kwargs)
File "C:\Users\linh/.cache\huggingface\modules\transformers_modules\local\quantization.py", line 161, in init
kernels = ctypes.cdll.LoadLibrary(kernel_file)
File "C:\Users\linh\AppData\Local\conda\conda\envs\voice\lib\ctypes__init__.py", line 452, in LoadLibrary
return self._dlltype(name)
File "C:\Users\linh\AppData\Local\conda\conda\envs\voice\lib\ctypes__init.py", line 374, in init__
self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'C:\Users\linh.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.so' (or one of its dependencies). Try using the full path with constructor syntax.
报错如上。
C:\Users\linh\Desktop\chatglm-voice>python chat.py Explicitly passing a
model = AutoModel.from_pretrained(args.ChatGLM, trust_remote_code=True).half().quantize(4).cuda()
File "C:\Users\linh/.cache\huggingface\modules\transformers_modules\local\modeling_chatglm.py", line 1397, in quantize load_cpu_kernel(kwargs)
File "C:\Users\linh/.cache\huggingface\modules\transformers_modules\local\quantization.py", line 394, in load_cpu_kernel
cpu_kernels = CPUKernel(kwargs)
File "C:\Users\linh/.cache\huggingface\modules\transformers_modules\local\quantization.py", line 161, in init
kernels = ctypes.cdll.LoadLibrary(kernel_file)
File "C:\Users\linh\AppData\Local\conda\conda\envs\voice\lib\ctypes__init__.py", line 452, in LoadLibrary
return self._dlltype(name)
File "C:\Users\linh\AppData\Local\conda\conda\envs\voice\lib\ctypes__init.py", line 374, in init__
self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'C:\Users\linh.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.so' (or one of its dependencies). Try using the full path with constructor syntax.
报错如上。
revision
is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. Explicitly passing arevision
is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision. Explicitly passing arevision
is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. No compiled kernel found. Compiling kernels : C:\Users\linh.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.c Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\linh.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.c -shared -o C:\Users\linh.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.so Kernels compiled : C:\Users\linh.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.so Cannot load cpu kernel, don't use quantized model on cpu. Using quantization cache Applying quantization to glm layers INFO:transformers_modules.local.modeling_chatglm:Already quantized, reloading cpu kernel. No compiled kernel found. Compiling kernels : C:\Users\linh.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.c Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\linh.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.c -shared -o C:\Users\linh.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.so Kernels compiled : C:\Users\linh.cache\huggingface\modules\transformers_modules\local\quantization_kernels_parallel.so Traceback (most recent call last): File "C:\Users\linh\Desktop\chatglm-voice\chat.py", line 93, in