/AutoAWQ/awq/modules/linear/gemv_fast.py:10: UserWarning: AutoAWQ could not load GEMVFast kernels extension. Details: No module named 'awq_v2_ext'
warnings.warn(f"AutoAWQ could not load GEMVFast kernels extension. Details: {ex}")
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
Loading checkpoint shards: 25%|██▌ | 1/4 [00:00<00:02, 1.00it/s]
Loading checkpoint shards: 50%|█████ | 2/4 [00:01<00:01, 1.02it/s]
Loading checkpoint shards: 75%|███████▌ | 3/4 [00:02<00:00, 1.03it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00, 1.10it/s]
Loading checkpoint shards: 100%|██████████| 4/4 [00:03<00:00, 1.07it/s]
Using the latest cached version of the dataset since mit-han-lab/pile-val-backup couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'default' at /root/.cache/huggingface/datasets/mit-han-lab___pile-val-backup/default/0.0.0/2f5e46ae6a69cf0dce4b12f78241c408936ca0e4 (last modified on Wed Jul 31 09:21:55 2024).
Token indices sequence length is longer than the specified maximum sequence length for this model (57053 > 32768). Running this sequence through the model will result in indexing errors
We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
AWQ: 0%| | 0/28 [00:00<?, ?it/s]
AWQ: 4%|▎ | 1/28 [01:28<39:52, 88.60s/it]
AWQ: 7%|▋ | 2/28 [03:00<39:13, 90.51s/it]
AWQ: 11%|█ | 3/28 [04:34<38:24, 92.18s/it]
AWQ: 14%|█▍ | 4/28 [06:09<37:21, 93.39s/it]
AWQ: 18%|█▊ | 5/28 [07:44<36:00, 93.96s/it]
AWQ: 21%|██▏ | 6/28 [09:19<34:31, 94.14s/it]
AWQ: 25%|██▌ | 7/28 [10:54<33:05, 94.56s/it]
AWQ: 29%|██▊ | 8/28 [12:29<31:35, 94.78s/it]
AWQ: 32%|███▏ | 9/28 [14:04<30:01, 94.80s/it]
AWQ: 36%|███▌ | 10/28 [15:39<28:27, 94.87s/it]
AWQ: 39%|███▉ | 11/28 [17:15<26:54, 94.99s/it]
AWQ: 43%|████▎ | 12/28 [18:50<25:21, 95.08s/it]
AWQ: 46%|████▋ | 13/28 [20:25<23:48, 95.20s/it]
AWQ: 50%|█████ | 14/28 [22:00<22:11, 95.13s/it]
AWQ: 54%|█████▎ | 15/28 [23:36<20:37, 95.20s/it]
AWQ: 57%|█████▋ | 16/28 [25:11<19:02, 95.19s/it]
AWQ: 61%|██████ | 17/28 [26:46<17:28, 95.28s/it]
AWQ: 64%|██████▍ | 18/28 [28:22<15:53, 95.36s/it]
AWQ: 68%|██████▊ | 19/28 [29:58<14:19, 95.55s/it]
AWQ: 71%|███████▏ | 20/28 [31:33<12:43, 95.46s/it]
AWQ: 75%|███████▌ | 21/28 [33:09<11:08, 95.44s/it]
AWQ: 79%|███████▊ | 22/28 [34:44<09:33, 95.55s/it]
AWQ: 82%|████████▏ | 23/28 [36:20<07:57, 95.58s/it]
AWQ: 86%|████████▌ | 24/28 [37:56<06:22, 95.67s/it]
AWQ: 89%|████████▉ | 25/28 [39:32<04:47, 95.73s/it]
AWQ: 93%|█████████▎| 26/28 [41:07<03:11, 95.60s/it]
AWQ: 96%|█████████▋| 27/28 [42:43<01:35, 95.57s/it]
AWQ: 96%|█████████▋| 27/28 [42:50<01:35, 95.22s/it]
Traceback (most recent call last):
File "/user/AutoAWQ/Quantalize4bit.py", line 15, in <module>
# 不包含校准过程
File "/root/anaconda3/envs/qwen/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/user/AutoAWQ/awq/models/base.py", line 231, in quantize
self.quantizer.quantize()
File "/user/AutoAWQ/awq/quantize/quantizer.py", line 166, in quantize
scales_list = [
File "/user/AutoAWQ/awq/quantize/quantizer.py", line 167, in <listcomp>
self._search_best_scale(self.modules[i], **layer)
File "/root/anaconda3/envs/qwen/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/user/AutoAWQ/awq/quantize/quantizer.py", line 330, in _search_best_scale
best_scales = self._compute_best_scale(
File "/user/AutoAWQ/awq/quantize/quantizer.py", line 409, in _compute_best_scale
raise Exception
Exception
I kept looking up issues and thought it was a problem of version construction, so I downloaded cuda11.8(my driver version is cuda11.7, The python version is 3.10) corresponding to the whl file (autoawq-0.2.6-cp310-cp310-linux_x86_64.whl) and install, and then run the quantization code, the result has remained at 0%
/AutoAWQ/awq/modules/linear/exllama.py:12: UserWarning: AutoAWQ could not load ExLlama kernels extension. Details: libcudart.so.11.0: cannot open shared object file: No such file or directory
warnings.warn(f"AutoAWQ could not load ExLlama kernels extension. Details: {ex}")
/AutoAWQ/awq/modules/linear/exllamav2.py:13: UserWarning: AutoAWQ could not load ExLlamaV2 kernels extension. Details: libcudart.so.11.0: cannot open shared object file: No such file or directory
warnings.warn(f"AutoAWQ could not load ExLlamaV2 kernels extension. Details: {ex}")
/AutoAWQ/awq/modules/linear/gemm.py:14: UserWarning: AutoAWQ could not load GEMM kernels extension. Details: libcudart.so.11.0: cannot open shared object file: No such file or directory
warnings.warn(f"AutoAWQ could not load GEMM kernels extension. Details: {ex}")
/AutoAWQ/awq/modules/linear/gemv.py:11: UserWarning: AutoAWQ could not load GEMV kernels extension. Details: libcudart.so.11.0: cannot open shared object file: No such file or directory
warnings.warn(f"AutoAWQ could not load GEMV kernels extension. Details: {ex}")
/AutoAWQ/awq/modules/linear/gemv_fast.py:10: UserWarning: AutoAWQ could not load GEMVFast kernels extension. Details: No module named 'awq_v2_ext'
warnings.warn(f"AutoAWQ could not load GEMVFast kernels extension. Details: {ex}")
/root/anaconda3/envs/qwen/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: '/root/anaconda3/envs/qwen/lib/python3.10/site-packages/torchvision/image.so: undefined symbol: _ZN3c1017RegisterOperatorsD1Ev'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
warn(
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:03<00:00, 1.02it/s]
Using the latest cached version of the dataset since mit-han-lab/pile-val-backup couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'default' at /root/.cache/huggingface/datasets/mit-han-lab___pile-val-backup/default/0.0.0/2f5e46ae6a69cf0dce4b12f78241c408936ca0e4 (last modified on Wed Jul 31 09:21:55 2024).
Token indices sequence length is longer than the specified maximum sequence length for this model (57053 > 32768). Running this sequence through the model will result in indexing errors
/root/anaconda3/envs/qwen/lib/python3.10/site-packages/torch/cuda/__init__.py:118: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11070). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
We detected that you are passing `past_key_values` as a tuple and this is deprecated and will be removed in v4.43. Please use an appropriate `Cache` class (https://huggingface.co/docs/transformers/v4.41.3/en/internal/generation_utils#transformers.Cache)
AWQ: 0%| | 0/28 [00:00<?, ?it/s]
Could I request your assistance? I would be very grateful.
When I quantized the model of Qwen2-7B (not fine-tuned) using the quantization code below, I got the following error: quantization code
Error
I kept looking up issues and thought it was a problem of version construction, so I downloaded cuda11.8(my driver version is cuda11.7, The python version is 3.10) corresponding to the whl file (autoawq-0.2.6-cp310-cp310-linux_x86_64.whl) and install, and then run the quantization code, the result has remained at 0%
Could I request your assistance? I would be very grateful.