[BUG/Help] <title>第二个视频运行api.py 一直无法加载，报错如下图

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. Explicitly passing a revision is encouraged when loading a configuration with custom code to ensure no malicious code has been contributed in a newer revision. Explicitly passing a revision is encouraged when loading a model with custom code to ensure no malicious code has been contributed in a newer revision. No compiled kernel found. Compiling kernels : C:\Users\User.cache\huggingface\modules\transformers_modules\models--THUDM--chatglm-6b-int4\quantization_kernels_parallel.c Compiling gcc -O3 -fPIC -pthread -fopenmp -std=c99 C:\Users\User.cache\huggingface\modules\transformers_modules\models--THUDM--chatglm-6b-int4\quantization_kernels_parallel.c -shared -o C:\Users\User.cache\huggingface\modules\transformers_modules\models--THUDM--chatglm-6b-int4\quantization_kernels_parallel.so Load parallel cpu kernel failed, using default cpu kernel code: Traceback (most recent call last): File "C:\Users\User/.cache\huggingface\modules\transformers_modules\models--THUDM--chatglm-6b-int4\quantization.py", line 156, in init kernels = ctypes.cdll.LoadLibrary(kernel_file) File "D:\ProgramData\Anaconda3\envs\yolov5\lib\ctypes__init__.py", line 452, in LoadLibrary return self._dlltype(name) File "D:\ProgramData\Anaconda3\envs\yolov5\lib\ctypes__init.py", line 374, in init__ self._handle = _dlopen(self._name, mode) FileNotFoundError: Could not find module 'C:\Users\User.cache\huggingface\modules\transformers_modules\models--THUDM--chatglm-6b-int4\quantization_kernels_parallel.so' (or one of its dependencies). Try using the full path with constructor syntax.

Compiling gcc -O3 -fPIC -std=c99 C:\Users\User.cache\huggingface\modules\transformers_modules\models--THUDM--chatglm-6b-int4\quantization_kernels.c -shared -o C:\Users\User.cache\huggingface\modules\transformers_modules\models--THUDM--chatglm-6b-int4\quantization_kernels.so Load kernel : C:\Users\User.cache\huggingface\modules\transformers_modules\models--THUDM--chatglm-6b-int4\quantization_kernels.so Using quantization cache Applying quantization to glm layers

Expected Behavior

No response

Steps To Reproduce

在 Windows 下加载 INT-4 量化模型,无法正常加载

Environment

- OS:windows10
- Python:3.10
- Transformers: 4.27.1
- PyTorch:2.01
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :

Anything else?

No response

THUDM / ChatGLM-6B