Open TheBloke opened 1 year ago
I also tried adding this:
RUN /bin/bash -o pipefail -c 'cd /root/AutoGPTQ && PATH=/usr/local/cuda/bin:"$PATH" TORCH_CUDA_ARCH_LIST="8.0;8.6+PTX" BUILD_CUDA_EXT=1 python setup.py install'
But it's still not building the kernel:
logs:
WARNING:CUDA extension not installed.
WARNING:The safetensors archive passed at models/TheBloke_Samantha-7B-GPTQ/Samantha-7B-GPTQ-4bit-128g.no-act-order.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
no kernel:
root@5a593abbb4cd:/# find /usr/local/lib/python3.10 -name "*autogptq_cuda*"
root@5a593abbb4cd:/#
I worked around it by building a wheel while logged into the Docker, then doing this in the Dockerfile:
RUN pip install https://github.com/TheBlokeAI/wheel/raw/master/auto_gptq-0.2.0%2Bcu1162-cp310-cp310-linux_x86_64.whl
But I would love to understand how to do this properly, building from Dockerfile - and what's causing it not to work right now.
I've not tried building wheels in docker environment, I might try it this weekend. I guess its because in docker cuda can't be detected properly, maybe you can try torch.cuda.is_available()
first?
I've not tried building wheels in docker environment, I might try it this weekend. I guess its because in docker cuda can't be detected properly, maybe you can try
torch.cuda.is_available()
first?
I have the same issue and I have confirmed that command returns True.
nvidia/cuda:11.8.0-devel-ubuntu22.04
is a great base image to test this. I'd assume it's one of the most used base images for ML containers.
I'm stuck on the same. I'm trying to one of your, Tom, models using modal.com service. They use Docker images as well.
I'm using this snippet to build the image and keep having the huge red warning "CUDA Extension not installed".
IMAGE_MODEL_DIR = "/model"
MODEL_BASE_FILE = "Wizard-Vicuna-13B-Uncensored-GPTQ-4bit-128g.compat.no-act-order"
def download_model():
from huggingface_hub import snapshot_download
MODEL_NAME = "TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ"
snapshot_download(MODEL_NAME, local_dir=IMAGE_MODEL_DIR)
stub.image = (
Image.from_dockerhub(
"nvidia/cuda:11.7.0-devel-ubuntu20.04",
setup_dockerfile_commands=[
"RUN apt-get update",
"RUN apt-get install -y python3 python3-pip python-is-python3",
],
)
.apt_install("git", "gcc", "build-essential")
.pip_install(
"huggingface_hub",
"transformers",
"torch",
"auto-gptq",
"einops",
)
.run_function(download_model)
)
I find myself even more confused that README mentions that we may want to disable CUDA extensions. It seems that having them come in the first place is a challenge so I wonder why would someone want to disable them...
Yeah this has been quite a pain. I was stuck with this problem yesterday on a Lambda Labs box.
I eventually solved it by installing miniconda, creating a new environment with torch2 etc, and then building from source in there.
I just can't understand what it's checking for and why it fails.
I did notice something interesting: during the execution of setup.py
, sys.path
is set to something that doesn't include the normal python site-packages directory:
I hacked setup.py
to have this code:
try:
print("BEFORE BEFORE BEFORE")
print(sys.path)
import torch
print("AFTER AFTER AFTER")
TORCH_AVAILABLE = True
except ImportError:
TORCH_AVAILABLE = False
And confirmed that:
import torch
sys.path
was different to what it should be.Here's some logs I recorded last night:
On the command line, all is fine:
[pytorch2] tomj@a10:/workspace/git/AutoGPTQ git:(main*) $ python
Python 3.10.11 (main, Apr 5 2023, 14:15:10) [GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.path
['', '/usr/lib/python310.zip', '/usr/lib/python3.10', '/usr/lib/python3.10/lib-dynload', '/workspace/venv/pytorch2/lib/python3.10/site-packages', '/workspace/venv/pytorch2/lib/python3.10/site-packages/quant_cuda-0.0.0-py3.10-linux-x86_64.egg']
>>> import torch
>>>
But in setup.py, I could not import torch and sys.path is different:
[pytorch2] tomj@a10:/workspace/git/AutoGPTQ git:(main*) $ pip install -v .
Using pip 23.1.2 from /workspace/venv/pytorch2/lib/python3.10/site-packages/pip (python 3.10)
Processing /workspace/git/AutoGPTQ
Running command pip subprocess to install build dependencies
Collecting setuptools>=40.8.0
Using cached setuptools-67.8.0-py3-none-any.whl (1.1 MB)
Collecting wheel
Using cached wheel-0.40.0-py3-none-any.whl (64 kB)
Installing collected packages: wheel, setuptools
Successfully installed setuptools-67.8.0 wheel-0.40.0
Installing build dependencies ... done
Running command Getting requirements to build wheel
BEFORE BEFORE BEFORE
['/workspace/git/AutoGPTQ', '/workspace/venv/pytorch2/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process', '/tmp/pip-build-env-f0hzsznx/site', '/usr/lib/python310.zip', '/usr/lib/python3.10', '/usr/lib/python3.10/lib-dynload', '/tmp/pip-build-env-f0hzsznx/overlay/lib/python3.10/site-packages', '/tmp/pip-build-env-f0hzsznx/normal/lib/python3.10/site-packages']
running egg_info
writing auto_gptq.egg-info/PKG-INFO
writing dependency_links to auto_gptq.egg-info/dependency_links.txt
writing requirements to auto_gptq.egg-info/requires.txt
writing top-level names to auto_gptq.egg-info/top_level.txt
reading manifest file 'auto_gptq.egg-info/SOURCES.txt'
adding license file 'LICENSE'
writing manifest file 'auto_gptq.egg-info/SOURCES.txt'
Getting requirements to build wheel ... done
Running command Preparing metadata (pyproject.toml)
BEFORE BEFORE BEFORE
['/workspace/git/AutoGPTQ', '/workspace/venv/pytorch2/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process', '/tmp/pip-build-env-f0hzsznx/site', '/usr/lib/python310.zip', '/usr/lib/python3.10', '/usr/lib/python3.10/lib-dynload', '/tmp/pip-build-env-f0hzsznx/overlay/lib/python3.10/site-packages', '/tmp/pip-build-env-f0hzsznx/normal/lib/python3.10/site-packages']
running dist_info
So this is sys.path on the command line:
['', '/usr/lib/python310.zip',
'/usr/lib/python3.10',
'/usr/lib/python3.10/lib-dynload',
'/workspace/venv/pytorch2/lib/python3.10/site-packages',
'/workspace/venv/pytorch2/lib/python3.10/site-packages/quant_cuda-0.0.0-py3.10-linux-x86_64.egg']
And this is sys.path during setup.py
:
['/workspace/git/AutoGPTQ',
'/workspace/venv/pytorch2/lib/python3.10/site-packages/pip/_vendor/pyproject_hooks/_in_process',
'/tmp/pip-build-env-f0hzsznx/site',
'/usr/lib/python310.zip',
'/usr/lib/python3.10',
'/usr/lib/python3.10/lib-dynload',
'/tmp/pip-build-env-f0hzsznx/overlay/lib/python3.10/site-packages', '
/tmp/pip-build-env-f0hzsznx/normal/lib/python3.10/site-packages']
It is missing /workspace/venv/pytorch2/lib/python3.10/site-packages
from my venv?
Could this be part of the problem? But why? I don't really understand the setup.py process at the moment.
In the end I got it working in conda with a pip install .
source code install. But of course my method may not work for you on modal, Luke :(
@PanQiWei we'd really appreciate if you could investigate this as it's causing problems for quite a few people. Let me know if you can't re-create it and I can provide you with an SSH login to a system that demonstrates the problem.
Thank you @TheBloke. I used your suggestion to clone source code and pip install .
effectively editing my code to:
IMAGE_MODEL_DIR = "/model"
MODEL_BASE_FILE = "Wizard-Vicuna-13B-Uncensored-GPTQ-4bit-128g.compat.no-act-order"
def download_model():
from huggingface_hub import snapshot_download
MODEL_NAME = "TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ"
snapshot_download(MODEL_NAME, local_dir=IMAGE_MODEL_DIR)
stub.image = (
Image.from_dockerhub(
"nvidia/cuda:11.7.0-devel-ubuntu20.04",
setup_dockerfile_commands=[
"RUN apt-get update",
"RUN apt-get install -y python3 python3-pip python-is-python3",
],
)
# it also reaches the same output with debian slim image if we were to swap out the base image
# Image.debian_slim(python_version="3.10")
.apt_install("git", "gcc", "build-essential")
.run_commands(
"git clone https://github.com/PanQiWei/AutoGPTQ.git",
"cd AutoGPTQ && pip install -e .",
)
.pip_install(
"huggingface_hub",
"transformers",
"torch",
"einops",
)
.run_function(download_model)
)
But now I get:
AttributeError: module 'autogptq_cuda' has no attribute 'vecquant4matmul_faster_old'
which keeps me wondering if it was false sense of progress and I'm actually missing "CUDA-something" underneath.
I def hope William can help us out and shed some light on the challenge.
Although you know what, Tom, I've been looking at this example from modal guys at Falcon. https://github.com/modal-labs/modal-examples/blob/main/06_gpu_and_ml/falcon_gptq.py
After you explained that you used latest AutoGPTQ to quantizize it makes sense why I had "model not found in path" issues after I tried to swap out to your previously produced quantizations. But what I don't understand is how this code doesn't get CUDE extension missing error. Perhaps you can notice something.
So I wanted to post an update since I managed to get it working last night. I still don't understand the magic to its roots but there were two catches.
git clone
and pip install .
this repository to get it compile CUDA extensions. pip install
ing it IS VERY IMPORTANT Docker sees GPU. It seems that with docker if we build images on a platform without GPUs (I'm using MacBook, Modal.com uses bare CPUxRAM servers) it will cause weird errors such as the one above with vecquant4matmul_faster_old
. As such I'm now using the following snippet to build the image:
IMAGE_MODEL_DIR = "/model"
MODEL_BASE_FILE = "Wizard-Vicuna-13B-Uncensored-GPTQ-4bit-128g.compat.no-act-order"
def download_model():
from huggingface_hub import snapshot_download
MODEL_NAME = "TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ"
snapshot_download(MODEL_NAME, local_dir=IMAGE_MODEL_DIR)
stub.image = (
Image.from_dockerhub(
"nvidia/cuda:11.7.0-devel-ubuntu20.04",
setup_dockerfile_commands=[
"RUN apt-get update",
"RUN apt-get install -y python3 python3-pip python-is-python3",
],
)
.apt_install("git", "gcc", "build-essential")
.run_commands(
"git clone https://github.com/PanQiWei/AutoGPTQ.git",
"cd AutoGPTQ && pip install -e .",
gpu="A10G",
)
.pip_install(
"huggingface_hub",
"transformers",
"torch",
"einops",
)
.run_function(download_model)
)
Notice the gpu parameter I put when running pip
command.
Down the road I will need to build image for other services so I will need to figure out how to fake or force it to build in the right way and that is a huge blank spot in my brain. A great opportunity to learn something new down the road.
@TheBloke I'd kindly suggest for you to look into the option 3 that helped me out and look into docker flag --gpus all
as well as this https://stackoverflow.com/questions/59691207/docker-build-with-nvidia-runtime.
One more thing that I'm lost is why I cannot use any newer CUDA image because trying to use one I'm being screamed with "pytorch has been compiled with CUDA 11.7.0 version". Which I cannot understand in which step exactly that happens...
If you pull the 0.2.1 source from GH and try to compile with CUDA, the issue is not Torch, but https://github.com/PanQiWei/AutoGPTQ/blob/main/setup.py#L13 which is IN_GITHUB_ACTIONS = os.environ.get("GITHUB_ACTIONS", "false") == "true"
I guess that's set so that CUDA support is automatically compiled in GH Wheel releases, but the rest of us will never get it compiled from the source, as we don't use GH Actions locally. :)
Replacing that line with IN_GITHUB_ACTIONS = True
will get the CUDA extension compiled, as long as you have CUDA_VERSION=xxx set in your environment, e.g. export CUDA_VERSION=118
Using CUDA 12.1 here in WSL2, and issuing pip install .[triton]
I am able to compile the CUDA extension as well as use Triton and get ~2.5 t/s on my 3090 with TheBloke/WizardLM-Uncensored-Falcon-40B
model.
Otherwise, I can barely scrape 1 t/s after a warm-up, but generally hovers around 0.7 t/s on default install of AutoGPTQ without CUDA or Triton.
Aside, pip install auto-gptq
fails to compile the CUDA extension here as well, returns an error:
running build_ext
/home/user/Envs/text-generation-webui_env/lib/python3.10/site-packages/torch/utils/cpp_extension.py:399: UserWarning: There are no x86_64-linux-gnu-g++ version bounds defined for CUDA version 12.1
warnings.warn(f'There are no {compiler_name} version bounds defined for CUDA version {cuda_str_version}')
building 'autogptq_cuda' extension
creating /tmp/pip-install-p56cxy8z/auto-gptq_4e922a3ed443469cb66ca8aca66e0719/build/temp.linux-x86_64-cpython-310
creating /tmp/pip-install-p56cxy8z/auto-gptq_4e922a3ed443469cb66ca8aca66e0719/build/temp.linux-x86_64-cpython-310/autogptq_cuda
Emitting ninja build file /tmp/pip-install-p56cxy8z/auto-gptq_4e922a3ed443469cb66ca8aca66e0719/build/temp.linux-x86_64-cpython-310/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: error: '/tmp/pip-install-p56cxy8z/auto-gptq_4e922a3ed443469cb66ca8aca66e0719/autogptq_cuda/autogptq_cuda.cpp', needed by '/tmp/pip-install-p56cxy8z/auto-gptq_4e922a3ed443469cb66ca8aca66e0719/build/temp.linux-x86_64-cpython-310/autogptq_cuda/autogptq_cuda.o', missing and no known rule to make it
Traceback (most recent call last):
File "/home/user/Envs/text-generation-webui_env/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1900, in _run_ninja_build
subprocess.run(
File "/usr/lib/python3.10/subprocess.py", line 524, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
If you pull the 0.2.1 source from GH and try to compile with CUDA, the issue is not Torch, but https://github.com/PanQiWei/AutoGPTQ/blob/main/setup.py#L13 which is IN_GITHUB_ACTIONS = os.environ.get("GITHUB_ACTIONS", "false") == "true"
Actually I'm confused since I did exactly that:
.run_commands(
"git clone https://github.com/PanQiWei/AutoGPTQ.git",
"cd AutoGPTQ && pip install -e .",
gpu="A10G",
)
and it works. Reading that part of the code seems to me that it affects the printed version but it should still try compiling down the road. I'm confused however.
I hope @PanQiWei can help us out!
If you pull the 0.2.1 source from GH and try to compile with CUDA, the issue is not Torch, but https://github.com/PanQiWei/AutoGPTQ/blob/main/setup.py#L13 which is
IN_GITHUB_ACTIONS = os.environ.get("GITHUB_ACTIONS", "false") == "true"
I guess that's set so that CUDA support is automatically compiled in GH Wheel releases, but the rest of us will never get it compiled from the source, as we don't use GH Actions locally. :)
Replacing that line with
IN_GITHUB_ACTIONS = True
will get the CUDA extension compiled, as long as you have CUDA_VERSION=xxx set in your environment, e.g.export CUDA_VERSION=118
and it works. Reading that part of the code seems to me that it affects the printed version but it should still try compiling down the road. I'm confused however.
Yeah I'm not sure that's correct. Because on some systems I definitely can build the CUDA extension with pip install .
- without editing setup.py, and without CUDA_VERSION set - on some systems.
So I think it is Torch related somehow.
For example, testing on Runpod using their runpod/pytorch:3.10-2.0.1-117-devel
Docker which has torch 2.0.1+cu117 with CUDA toolkit 11.7:
root@005505d3451a:~# which nvcc
/usr/local/cuda/bin/nvcc
root@005505d3451a:~# nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0
root@005505d3451a:~# python
Python 3.10.6 (main, Mar 10 2023, 10:55:28) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>>
Install from PyPi fails with strange error, not one I've seen before:
root@005505d3451a:~# pip install -v --no-cache-dir auto-gptq
Using pip 23.1.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)
Collecting auto-gptq
Downloading auto_gptq-0.2.1.tar.gz (48 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 48.0/48.0 kB 1.6 MB/s eta 0:00:00
Running command python setup.py egg_info
running egg_info
creating /tmp/pip-pip-egg-info-ppi_pk6y/auto_gptq.egg-info
writing /tmp/pip-pip-egg-info-ppi_pk6y/auto_gptq.egg-info/PKG-INFO
writing dependency_links to /tmp/pip-pip-egg-info-ppi_pk6y/auto_gptq.egg-info/dependency_links.txt
writing requirements to /tmp/pip-pip-egg-info-ppi_pk6y/auto_gptq.egg-info/requires.txt
writing top-level names to /tmp/pip-pip-egg-info-ppi_pk6y/auto_gptq.egg-info/top_level.txt
writing manifest file '/tmp/pip-pip-egg-info-ppi_pk6y/auto_gptq.egg-info/SOURCES.txt'
/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
reading manifest file '/tmp/pip-pip-egg-info-ppi_pk6y/auto_gptq.egg-info/SOURCES.txt'
adding license file 'LICENSE'
writing manifest file '/tmp/pip-pip-egg-info-ppi_pk6y/auto_gptq.egg-info/SOURCES.txt'
Preparing metadata (setup.py) ... done
Requirement already satisfied: accelerate>=0.19.0 in /usr/local/lib/python3.10/dist-packages (from auto-gptq) (0.19.0)
Requirement already satisfied: datasets in /usr/local/lib/python3.10/dist-packages (from auto-gptq) (2.12.0)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from auto-gptq) (1.24.1)
Requirement already satisfied: rouge in /usr/local/lib/python3.10/dist-packages (from auto-gptq) (1.0.1)
Requirement already satisfied: torch>=1.13.0 in /usr/local/lib/python3.10/dist-packages (from auto-gptq) (2.0.1+cu117)
Requirement already satisfied: safetensors in /usr/local/lib/python3.10/dist-packages (from auto-gptq) (0.3.1)
Requirement already satisfied: transformers>=4.26.1 in /usr/local/lib/python3.10/dist-packages (from auto-gptq) (4.29.2)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from accelerate>=0.19.0->auto-gptq) (23.1)
Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from accelerate>=0.19.0->auto-gptq) (5.9.5)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from accelerate>=0.19.0->auto-gptq) (6.0)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->auto-gptq) (3.9.0)
Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->auto-gptq) (4.4.0)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->auto-gptq) (1.11.1)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->auto-gptq) (3.0)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->auto-gptq) (3.1.2)
Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->auto-gptq) (2.0.0)
Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.13.0->auto-gptq) (3.25.0)
Requirement already satisfied: lit in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.13.0->auto-gptq) (15.0.7)
Requirement already satisfied: huggingface-hub<1.0,>=0.14.1 in /usr/local/lib/python3.10/dist-packages (from transformers>=4.26.1->auto-gptq) (0.15.1)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers>=4.26.1->auto-gptq) (2023.6.3)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers>=4.26.1->auto-gptq) (2.28.1)
Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /usr/local/lib/python3.10/dist-packages (from transformers>=4.26.1->auto-gptq) (0.13.3)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers>=4.26.1->auto-gptq) (4.65.0)
Requirement already satisfied: pyarrow>=8.0.0 in /usr/local/lib/python3.10/dist-packages (from datasets->auto-gptq) (12.0.0)
Requirement already satisfied: dill<0.3.7,>=0.3.0 in /usr/local/lib/python3.10/dist-packages (from datasets->auto-gptq) (0.3.6)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from datasets->auto-gptq) (2.0.2)
Requirement already satisfied: xxhash in /usr/local/lib/python3.10/dist-packages (from datasets->auto-gptq) (3.2.0)
Requirement already satisfied: multiprocess in /usr/local/lib/python3.10/dist-packages (from datasets->auto-gptq) (0.70.14)
Requirement already satisfied: fsspec[http]>=2021.11.1 in /usr/local/lib/python3.10/dist-packages (from datasets->auto-gptq) (2023.5.0)
Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from datasets->auto-gptq) (3.8.4)
Requirement already satisfied: responses<0.19 in /usr/local/lib/python3.10/dist-packages (from datasets->auto-gptq) (0.18.0)
Requirement already satisfied: six in /usr/lib/python3/dist-packages (from rouge->auto-gptq) (1.16.0)
Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->auto-gptq) (23.1.0)
Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->auto-gptq) (2.1.1)
Requirement already satisfied: multidict<7.0,>=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->auto-gptq) (6.0.4)
Requirement already satisfied: async-timeout<5.0,>=4.0.0a3 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->auto-gptq) (4.0.2)
Requirement already satisfied: yarl<2.0,>=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->auto-gptq) (1.9.2)
Requirement already satisfied: frozenlist>=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->auto-gptq) (1.3.3)
Requirement already satisfied: aiosignal>=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets->auto-gptq) (1.3.1)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers>=4.26.1->auto-gptq) (3.4)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers>=4.26.1->auto-gptq) (1.26.13)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers>=4.26.1->auto-gptq) (2022.12.7)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.13.0->auto-gptq) (2.1.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets->auto-gptq) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets->auto-gptq) (2023.3)
Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets->auto-gptq) (2023.3)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.13.0->auto-gptq) (1.2.1)
Building wheels for collected packages: auto-gptq
Running command python setup.py bdist_wheel
running bdist_wheel
/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
warnings.warn(msg.format('we could not find ninja.'))
running build
running build_py
creating build
creating build/lib.linux-x86_64-cpython-310
creating build/lib.linux-x86_64-cpython-310/auto_gptq
copying auto_gptq/__init__.py -> build/lib.linux-x86_64-cpython-310/auto_gptq
creating build/lib.linux-x86_64-cpython-310/auto_gptq/eval_tasks
copying auto_gptq/eval_tasks/__init__.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/eval_tasks
copying auto_gptq/eval_tasks/_base.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/eval_tasks
copying auto_gptq/eval_tasks/language_modeling_task.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/eval_tasks
copying auto_gptq/eval_tasks/sequence_classification_task.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/eval_tasks
copying auto_gptq/eval_tasks/text_summarization_task.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/eval_tasks
creating build/lib.linux-x86_64-cpython-310/auto_gptq/modeling
copying auto_gptq/modeling/__init__.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/modeling
copying auto_gptq/modeling/_base.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/modeling
copying auto_gptq/modeling/_const.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/modeling
copying auto_gptq/modeling/_utils.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/modeling
copying auto_gptq/modeling/auto.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/modeling
copying auto_gptq/modeling/bloom.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/modeling
copying auto_gptq/modeling/codegen.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/modeling
copying auto_gptq/modeling/gpt2.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/modeling
copying auto_gptq/modeling/gpt_bigcode.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/modeling
copying auto_gptq/modeling/gpt_neox.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/modeling
copying auto_gptq/modeling/gptj.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/modeling
copying auto_gptq/modeling/llama.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/modeling
copying auto_gptq/modeling/moss.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/modeling
copying auto_gptq/modeling/opt.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/modeling
copying auto_gptq/modeling/rw.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/modeling
creating build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules
copying auto_gptq/nn_modules/__init__.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules
copying auto_gptq/nn_modules/_fused_base.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules
copying auto_gptq/nn_modules/fused_gptj_attn.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules
copying auto_gptq/nn_modules/fused_llama_attn.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules
copying auto_gptq/nn_modules/fused_llama_mlp.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules
copying auto_gptq/nn_modules/qlinear.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules
copying auto_gptq/nn_modules/qlinear_old.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules
copying auto_gptq/nn_modules/qlinear_triton.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules
creating build/lib.linux-x86_64-cpython-310/auto_gptq/quantization
copying auto_gptq/quantization/__init__.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/quantization
copying auto_gptq/quantization/gptq.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/quantization
copying auto_gptq/quantization/quantizer.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/quantization
creating build/lib.linux-x86_64-cpython-310/auto_gptq/utils
copying auto_gptq/utils/__init__.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/utils
copying auto_gptq/utils/data_utils.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/utils
copying auto_gptq/utils/import_utils.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/utils
creating build/lib.linux-x86_64-cpython-310/auto_gptq/eval_tasks/_utils
copying auto_gptq/eval_tasks/_utils/__init__.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/eval_tasks/_utils
copying auto_gptq/eval_tasks/_utils/classification_utils.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/eval_tasks/_utils
copying auto_gptq/eval_tasks/_utils/generation_utils.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/eval_tasks/_utils
creating build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules/triton_utils
copying auto_gptq/nn_modules/triton_utils/__init__.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules/triton_utils
copying auto_gptq/nn_modules/triton_utils/custom_autotune.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules/triton_utils
copying auto_gptq/nn_modules/triton_utils/kernels.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules/triton_utils
copying auto_gptq/nn_modules/triton_utils/mixin.py -> build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules/triton_utils
running build_ext
building 'autogptq_cuda' extension
creating build/temp.linux-x86_64-cpython-310
creating build/temp.linux-x86_64-cpython-310/autogptq_cuda
x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/usr/local/lib/python3.10/dist-packages/torch/include -I/usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch/include/TH -I/usr/local/lib/python3.10/dist-packages/torch/include/THC -I/usr/local/cuda/include -Iautogptq_cuda -I/usr/include/python3.10 -c autogptq_cuda/autogptq_cuda.cpp -o build/temp.linux-x86_64-cpython-310/autogptq_cuda/autogptq_cuda.o -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -DTORCH_EXTENSION_NAME=autogptq_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++17
cc1plus: fatal error: autogptq_cuda/autogptq_cuda.cpp: No such file or directory
compilation terminated.
error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
full command: /usr/bin/python -u -c '
exec(compile('"'"''"'"''"'"'
# This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
#
# - It imports setuptools before invoking setup.py, to enable projects that directly
# import from `distutils.core` to work with newer packaging standards.
# - It provides a clear error message when setuptools is not installed.
# - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
# setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
# manifest_maker: standard file '"'"'-c'"'"' not found".
# - It generates a shim setup.py, for handling setup.cfg-only projects.
import os, sys, tokenize
try:
import setuptools
except ImportError as error:
print(
"ERROR: Can not execute `setup.py` since setuptools is not available in "
"the build environment.",
file=sys.stderr,
)
sys.exit(1)
__file__ = %r
sys.argv[0] = __file__
if os.path.exists(__file__):
filename = __file__
with tokenize.open(__file__) as f:
setup_py_code = f.read()
else:
filename = "<auto-generated setuptools caller>"
setup_py_code = "from setuptools import setup; setup()"
exec(compile(setup_py_code, filename, "exec"))
'"'"''"'"''"'"' % ('"'"'/tmp/pip-install-cz_mho_b/auto-gptq_76eb05edd3f1497d8ba64859a8374a37/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' bdist_wheel -d /tmp/pip-wheel-i5uzbege
cwd: /tmp/pip-install-cz_mho_b/auto-gptq_76eb05edd3f1497d8ba64859a8374a37/
Building wheel for auto-gptq (setup.py) ... error
ERROR: Failed building wheel for auto-gptq
Running setup.py clean for auto-gptq
Running command python setup.py clean
running clean
removing 'build/temp.linux-x86_64-cpython-310' (and everything under it)
removing 'build/lib.linux-x86_64-cpython-310' (and everything under it)
'build/bdist.linux-x86_64' does not exist -- can't clean it
'build/scripts-3.10' does not exist -- can't clean it
removing 'build'
Failed to build auto-gptq
ERROR: Could not build wheels for auto-gptq, which is required to install pyproject.toml-based projects
Install from source works great:
root@005505d3451a:~# git clone https://github.com/PanQiWei/AutoGPTQ
Cloning into 'AutoGPTQ'...
remote: Enumerating objects: 2159, done.
remote: Counting objects: 100% (490/490), done.
remote: Compressing objects: 100% (239/239), done.
remote: Total 2159 (delta 312), reused 316 (delta 240), pack-reused 1669
Receiving objects: 100% (2159/2159), 7.41 MiB | 3.18 MiB/s, done.
Resolving deltas: 100% (1442/1442), done.
root@005505d3451a:~# cd AutoGPTQ/
root@005505d3451a:~/AutoGPTQ# pip install -v .
Using pip 23.1.2 from /usr/local/lib/python3.10/dist-packages/pip (python 3.10)
Processing /root/AutoGPTQ
Running command python setup.py egg_info
running egg_info
creating /tmp/pip-pip-egg-info-9232g2ro/auto_gptq.egg-info
writing /tmp/pip-pip-egg-info-9232g2ro/auto_gptq.egg-info/PKG-INFO
writing dependency_links to /tmp/pip-pip-egg-info-9232g2ro/auto_gptq.egg-info/dependency_links.txt
writing requirements to /tmp/pip-pip-egg-info-9232g2ro/auto_gptq.egg-info/requires.txt
writing top-level names to /tmp/pip-pip-egg-info-9232g2ro/auto_gptq.egg-info/top_level.txt
writing manifest file '/tmp/pip-pip-egg-info-9232g2ro/auto_gptq.egg-info/SOURCES.txt'
/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py:476: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
....
running install
running install_lib
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/wheel
creating build/bdist.linux-x86_64/wheel/auto_gptq
copying build/lib.linux-x86_64-cpython-310/auto_gptq/__init__.py -> build/bdist.linux-x86_64/wheel/auto_gptq
creating build/bdist.linux-x86_64/wheel/auto_gptq/eval_tasks
copying build/lib.linux-x86_64-cpython-310/auto_gptq/eval_tasks/__init__.py -> build/bdist.linux-x86_64/wheel/auto_gptq/eval_tasks
copying build/lib.linux-x86_64-cpython-310/auto_gptq/eval_tasks/_base.py -> build/bdist.linux-x86_64/wheel/auto_gptq/eval_tasks
copying build/lib.linux-x86_64-cpython-310/auto_gptq/eval_tasks/language_modeling_task.py -> build/bdist.linux-x86_64/wheel/auto_gptq/eval_tasks
copying build/lib.linux-x86_64-cpython-310/auto_gptq/eval_tasks/sequence_classification_task.py -> build/bdist.linux-x86_64/wheel/auto_gptq/eval_tasks
copying build/lib.linux-x86_64-cpython-310/auto_gptq/eval_tasks/text_summarization_task.py -> build/bdist.linux-x86_64/wheel/auto_gptq/eval_tasks
creating build/bdist.linux-x86_64/wheel/auto_gptq/eval_tasks/_utils
copying build/lib.linux-x86_64-cpython-310/auto_gptq/eval_tasks/_utils/__init__.py -> build/bdist.linux-x86_64/wheel/auto_gptq/eval_tasks/_utils
copying build/lib.linux-x86_64-cpython-310/auto_gptq/eval_tasks/_utils/classification_utils.py -> build/bdist.linux-x86_64/wheel/auto_gptq/eval_tasks/_utils
copying build/lib.linux-x86_64-cpython-310/auto_gptq/eval_tasks/_utils/generation_utils.py -> build/bdist.linux-x86_64/wheel/auto_gptq/eval_tasks/_utils
creating build/bdist.linux-x86_64/wheel/auto_gptq/modeling
copying build/lib.linux-x86_64-cpython-310/auto_gptq/modeling/__init__.py -> build/bdist.linux-x86_64/wheel/auto_gptq/modeling
copying build/lib.linux-x86_64-cpython-310/auto_gptq/modeling/_base.py -> build/bdist.linux-x86_64/wheel/auto_gptq/modeling
copying build/lib.linux-x86_64-cpython-310/auto_gptq/modeling/_const.py -> build/bdist.linux-x86_64/wheel/auto_gptq/modeling
copying build/lib.linux-x86_64-cpython-310/auto_gptq/modeling/_utils.py -> build/bdist.linux-x86_64/wheel/auto_gptq/modeling
copying build/lib.linux-x86_64-cpython-310/auto_gptq/modeling/auto.py -> build/bdist.linux-x86_64/wheel/auto_gptq/modeling
copying build/lib.linux-x86_64-cpython-310/auto_gptq/modeling/bloom.py -> build/bdist.linux-x86_64/wheel/auto_gptq/modeling
copying build/lib.linux-x86_64-cpython-310/auto_gptq/modeling/codegen.py -> build/bdist.linux-x86_64/wheel/auto_gptq/modeling
copying build/lib.linux-x86_64-cpython-310/auto_gptq/modeling/gpt2.py -> build/bdist.linux-x86_64/wheel/auto_gptq/modeling
copying build/lib.linux-x86_64-cpython-310/auto_gptq/modeling/gpt_bigcode.py -> build/bdist.linux-x86_64/wheel/auto_gptq/modeling
copying build/lib.linux-x86_64-cpython-310/auto_gptq/modeling/gpt_neox.py -> build/bdist.linux-x86_64/wheel/auto_gptq/modeling
copying build/lib.linux-x86_64-cpython-310/auto_gptq/modeling/gptj.py -> build/bdist.linux-x86_64/wheel/auto_gptq/modeling
copying build/lib.linux-x86_64-cpython-310/auto_gptq/modeling/llama.py -> build/bdist.linux-x86_64/wheel/auto_gptq/modeling
copying build/lib.linux-x86_64-cpython-310/auto_gptq/modeling/moss.py -> build/bdist.linux-x86_64/wheel/auto_gptq/modeling
copying build/lib.linux-x86_64-cpython-310/auto_gptq/modeling/opt.py -> build/bdist.linux-x86_64/wheel/auto_gptq/modeling
copying build/lib.linux-x86_64-cpython-310/auto_gptq/modeling/rw.py -> build/bdist.linux-x86_64/wheel/auto_gptq/modeling
creating build/bdist.linux-x86_64/wheel/auto_gptq/nn_modules
copying build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules/__init__.py -> build/bdist.linux-x86_64/wheel/auto_gptq/nn_modules
copying build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules/_fused_base.py -> build/bdist.linux-x86_64/wheel/auto_gptq/nn_modules
copying build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules/fused_gptj_attn.py -> build/bdist.linux-x86_64/wheel/auto_gptq/nn_modules
copying build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules/fused_llama_attn.py -> build/bdist.linux-x86_64/wheel/auto_gptq/nn_modules
copying build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules/fused_llama_mlp.py -> build/bdist.linux-x86_64/wheel/auto_gptq/nn_modules
copying build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules/qlinear.py -> build/bdist.linux-x86_64/wheel/auto_gptq/nn_modules
copying build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules/qlinear_old.py -> build/bdist.linux-x86_64/wheel/auto_gptq/nn_modules
copying build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules/qlinear_triton.py -> build/bdist.linux-x86_64/wheel/auto_gptq/nn_modules
creating build/bdist.linux-x86_64/wheel/auto_gptq/nn_modules/triton_utils
copying build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules/triton_utils/__init__.py -> build/bdist.linux-x86_64/wheel/auto_gptq/nn_modules/triton_utils
copying build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules/triton_utils/custom_autotune.py -> build/bdist.linux-x86_64/wheel/auto_gptq/nn_modules/triton_utils
copying build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules/triton_utils/kernels.py -> build/bdist.linux-x86_64/wheel/auto_gptq/nn_modules/triton_utils
copying build/lib.linux-x86_64-cpython-310/auto_gptq/nn_modules/triton_utils/mixin.py -> build/bdist.linux-x86_64/wheel/auto_gptq/nn_modules/triton_utils
creating build/bdist.linux-x86_64/wheel/auto_gptq/quantization
copying build/lib.linux-x86_64-cpython-310/auto_gptq/quantization/__init__.py -> build/bdist.linux-x86_64/wheel/auto_gptq/quantization
copying build/lib.linux-x86_64-cpython-310/auto_gptq/quantization/gptq.py -> build/bdist.linux-x86_64/wheel/auto_gptq/quantization
copying build/lib.linux-x86_64-cpython-310/auto_gptq/quantization/quantizer.py -> build/bdist.linux-x86_64/wheel/auto_gptq/quantization
creating build/bdist.linux-x86_64/wheel/auto_gptq/utils
copying build/lib.linux-x86_64-cpython-310/auto_gptq/utils/__init__.py -> build/bdist.linux-x86_64/wheel/auto_gptq/utils
copying build/lib.linux-x86_64-cpython-310/auto_gptq/utils/data_utils.py -> build/bdist.linux-x86_64/wheel/auto_gptq/utils
copying build/lib.linux-x86_64-cpython-310/auto_gptq/utils/import_utils.py -> build/bdist.linux-x86_64/wheel/auto_gptq/utils
copying build/lib.linux-x86_64-cpython-310/autogptq_cuda.cpython-310-x86_64-linux-gnu.so -> build/bdist.linux-x86_64/wheel
running install_egg_info
running egg_info
creating auto_gptq.egg-info
writing auto_gptq.egg-info/PKG-INFO
writing dependency_links to auto_gptq.egg-info/dependency_links.txt
writing requirements to auto_gptq.egg-info/requires.txt
writing top-level names to auto_gptq.egg-info/top_level.txt
writing manifest file 'auto_gptq.egg-info/SOURCES.txt'
reading manifest file 'auto_gptq.egg-info/SOURCES.txt'
adding license file 'LICENSE'
writing manifest file 'auto_gptq.egg-info/SOURCES.txt'
Copying auto_gptq.egg-info to build/bdist.linux-x86_64/wheel/auto_gptq-0.2.1-py3.10.egg-info
running install_scripts
creating build/bdist.linux-x86_64/wheel/auto_gptq-0.2.1.dist-info/WHEEL
creating '/tmp/pip-wheel-eq_lyc6t/auto_gptq-0.2.1-cp310-cp310-linux_x86_64.whl' and adding 'build/bdist.linux-x86_64/wheel' to it
adding 'autogptq_cuda.cpython-310-x86_64-linux-gnu.so'
adding 'auto_gptq/__init__.py'
adding 'auto_gptq/eval_tasks/__init__.py'
adding 'auto_gptq/eval_tasks/_base.py'
adding 'auto_gptq/eval_tasks/language_modeling_task.py'
adding 'auto_gptq/eval_tasks/sequence_classification_task.py'
adding 'auto_gptq/eval_tasks/text_summarization_task.py'
adding 'auto_gptq/eval_tasks/_utils/__init__.py'
adding 'auto_gptq/eval_tasks/_utils/classification_utils.py'
adding 'auto_gptq/eval_tasks/_utils/generation_utils.py'
adding 'auto_gptq/modeling/__init__.py'
adding 'auto_gptq/modeling/_base.py'
adding 'auto_gptq/modeling/_const.py'
adding 'auto_gptq/modeling/_utils.py'
adding 'auto_gptq/modeling/auto.py'
adding 'auto_gptq/modeling/bloom.py'
adding 'auto_gptq/modeling/codegen.py'
adding 'auto_gptq/modeling/gpt2.py'
adding 'auto_gptq/modeling/gpt_bigcode.py'
adding 'auto_gptq/modeling/gpt_neox.py'
adding 'auto_gptq/modeling/gptj.py'
adding 'auto_gptq/modeling/llama.py'
adding 'auto_gptq/modeling/moss.py'
adding 'auto_gptq/modeling/opt.py'
adding 'auto_gptq/modeling/rw.py'
adding 'auto_gptq/nn_modules/__init__.py'
adding 'auto_gptq/nn_modules/_fused_base.py'
adding 'auto_gptq/nn_modules/fused_gptj_attn.py'
adding 'auto_gptq/nn_modules/fused_llama_attn.py'
adding 'auto_gptq/nn_modules/fused_llama_mlp.py'
adding 'auto_gptq/nn_modules/qlinear.py'
adding 'auto_gptq/nn_modules/qlinear_old.py'
adding 'auto_gptq/nn_modules/qlinear_triton.py'
adding 'auto_gptq/nn_modules/triton_utils/__init__.py'
adding 'auto_gptq/nn_modules/triton_utils/custom_autotune.py'
adding 'auto_gptq/nn_modules/triton_utils/kernels.py'
adding 'auto_gptq/nn_modules/triton_utils/mixin.py'
adding 'auto_gptq/quantization/__init__.py'
adding 'auto_gptq/quantization/gptq.py'
adding 'auto_gptq/quantization/quantizer.py'
adding 'auto_gptq/utils/__init__.py'
adding 'auto_gptq/utils/data_utils.py'
adding 'auto_gptq/utils/import_utils.py'
adding 'auto_gptq-0.2.1.dist-info/LICENSE'
adding 'auto_gptq-0.2.1.dist-info/METADATA'
adding 'auto_gptq-0.2.1.dist-info/WHEEL'
adding 'auto_gptq-0.2.1.dist-info/top_level.txt'
adding 'auto_gptq-0.2.1.dist-info/RECORD'
removing build/bdist.linux-x86_64/wheel
Building wheel for auto-gptq (setup.py) ... done
Created wheel for auto-gptq: filename=auto_gptq-0.2.1-cp310-cp310-linux_x86_64.whl size=2838539 sha256=f65da8f09d0f2b534c82f5e60f3a2fc00c746d4e25dfc59b5d8a2dae60b05c27
Stored in directory: /tmp/pip-ephem-wheel-cache-pnoayywt/wheels/24/88/75/0af9bf8f82c28467ed0e61e1ded8572458d43b390028b42ccb
Successfully built auto-gptq
Installing collected packages: tokenizers, safetensors, pytz, xxhash, tzdata, tqdm, rouge, regex, pyarrow, multidict, fsspec, frozenlist, dill, async-timeout, yarl, responses, pandas, multiprocess, huggingface-hub, aiosignal, transformers, aiohttp, datasets, accelerate, auto-gptq
changing mode of /usr/local/bin/tqdm to 755
changing mode of /usr/local/bin/rouge to 755
changing mode of /usr/local/bin/huggingface-cli to 755
changing mode of /usr/local/bin/transformers-cli to 755
changing mode of /usr/local/bin/datasets-cli to 755
changing mode of /usr/local/bin/accelerate to 755
changing mode of /usr/local/bin/accelerate-config to 755
changing mode of /usr/local/bin/accelerate-launch to 755
Successfully installed accelerate-0.19.0 aiohttp-3.8.4 aiosignal-1.3.1 async-timeout-4.0.2 auto-gptq-0.2.1 datasets-2.12.0 dill-0.3.6 frozenlist-1.3.3 fsspec-2023.5.0 huggingface-hub-0.15.1 multidict-6.0.4 multiprocess-0.70.14 pandas-2.0.2 pyarrow-12.0.0 pytz-2023.3 regex-2023.6.3 responses-0.18.0 rouge-1.0.1 safetensors-0.3.1 tokenizers-0.13.3 tqdm-4.65.0 transformers-4.29.2 tzdata-2023.3 xxhash-3.2.0 yarl-1.9.2
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
And now it works fine:
root@005505d3451a:~# python -c 'import torch ; import autogptq_cuda'
root@005505d3451a:~#
So it feels like there are multiple different failure possibilities at the moment:
pip install auto-gptq
won't try to build extensionpip install auto-gptq
tries to build extension but fails, like in the above examplepip install .
won't try to build extension@TheBloke it appears from your output that it does compile the CUDA extension. What does not work is correct versioning of then compiled wheel.
When installed and you do pip list
you should see 0.2.1+cuXXX for the version and not just 0.2.1, because the module can still be compiled without CUDA, so this creates ambiguity.
Don't use pip. Install it from source by
python setup.py install
everything will be fine.
This still needs fixing: https://github.com/PanQiWei/AutoGPTQ/blob/main/setup.py#L23
It will never mark the module VER+cuXXX unless it's being compiled through a GH action, so all local users will assume there's no CUDA support from the versioning.
I'm guessing this is related to https://github.com/PanQiWei/AutoGPTQ/issues/115#issuecomment-1581121864 where the autogptq_cuda
directory isn't being uploaded to PyPI.
there is some problem in v0.2.1, I will look into it and release a new patch this weekend
Thank you @PanQiWei !
v0.2.2 is working a lot better. On Lambda Labs with CUDA 11.8, pip install auto-gptq
worked immediately
I will test in a Docker later today
torch nightly now supports cuda12.1, I am testing out with autogptq 0.3
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121
torch nightly now supports cuda12.1, I am testing out with autogptq 0.3
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121
torch nightly cuda 12.1 and autogptq 0.3 worked well!
When installing pip install auto-gptq==0.2.2
inside docker image build time, I still keep getting CUDA extension not installed.
.
When installing
pip install auto-gptq==0.2.2
inside docker image build time, I still keep gettingCUDA extension not installed.
.
You need to export the build cuda extension macro pipe with pip install?
So I finally got it all working in Docker. Like @3dluvr said, it all depends on GITHUB_ACTIONS
.
I found two options that work in Docker:
ARG AUTOGPTQ="0.2.1"
# Install AutoGPTQ from source
RUN pip3 uninstall -qy auto-gptq && \
git clone https://github.com/PanQiWei/AutoGPTQ && \
cd AutoGPTQ && \
git checkout v$AUTOGPTQ && \
GITHUB_ACTIONS=true PATH=/usr/local/cuda/bin:"$PATH" TORCH_CUDA_ARCH_LIST="8.0;8.6+PTX;8.9;9.0" pip3 install .
ARG AUTOGPTQ="0.2.2"
RUN pip3 uninstall -y auto-gptq && \
CUDA_VERSION="" GITHUB_ACTIONS=true PATH=/usr/local/cuda/bin:"$PATH" TORCH_CUDA_ARCH_LIST="8.0;8.6+PTX;8.9;9.0" pip3 install auto-gptq==$AUTOGPTQ --no-cache-dir
As discussed earlier, if using the PyPi version, one has to unset CUDA_VERSION
else it causes this problem:
#10 4.157 Discarding https://files.pythonhosted.org/packages/94/07/3f3f6905a9bd334c6ee8025df42e4789379612703b935be328caaaa41c23/auto_gptq-0.2.2.tar.gz (from https://pypi.org/simple/auto-gptq/) (requires-python:>=3.8.0): Requested auto-gptq==0.2.2 from https://files.pythonhosted.org/packages/94/07/3f3f6905a9bd334c6ee8025df42e4789379612703b935be328caaaa41c23/auto_gptq-0.2.2.tar.gz has inconsistent version: expected '0.2.2', but metadata has '0.2.2+cu1180'
#10 4.158 ERROR: Could not find a version that satisfies the requirement auto-gptq==0.2.2 (from versions: 0.0.4, 0.0.5, 0.1.0, 0.2.0, 0.2.1, 0.2.2)
And the other issue, again as already discussed, is that setup.py
specifically checks to see if CUDA is currently available:
if TORCH_AVAILABLE:
BUILD_CUDA_EXT = int(os.environ.get('BUILD_CUDA_EXT', '1')) == 1
additional_setup_kwargs = dict()
if BUILD_CUDA_EXT and (torch.cuda.is_available() or IN_GITHUB_ACTIONS):
from torch.utils import cpp_extension
from distutils.sysconfig import get_python_lib
conda_cuda_include_dir=os.path.join(get_python_lib(),"nvidia/cuda_runtime/include")
We can override this with GITHUB_ACTIONS=true
but this is not at all obvious unless you read the code.
In my opinion, a simpler and more intuitive solution would be to use BUILD_CUDA_EXT
. So:
BUILD_CUDA_EXT=1
, the extension is always built. No other checks.BUILD_CUDA_EXT=0
, the extension is never builtBUILD_CUDA_EXT
is undefined, then it performs the same checks it does now.It could then also check GITHUB_ACTIONS
if that's needed as an additional override. But it shouldn't be required for the user to set GITHUB_ACTIONS=true
to build the extension when using it outside of Github Actions.
I think the question is whether AutoGPTQ would ever be used with non-CUDA capable cards.
If CUDA only, the extension should always be built regardless of any setting, because what is the point of using it without CUDA - might as well use CPUs only through llama.cpp. :)
Docker would then not care one way or the other, so a minimum check should be whether CUDA is installed, and fail if it isn't.
Unless I'm not seeing other use cases here?
Great job as always @TheBloke
I added pip uninstall -y auto-gptq & GITHUB_ACTIONS=true pip install auto-gptq --no-cache-dir
at the top of my entrypoint.sh
And now the extension is build with CUDA, only GITHUB_ACTIONS
was needed
I faced similar issues long time ago since cuda is not detectable during docker build, the solution was always to either disable the checks or force the build.
Hi
I'm having a lot of problems getting AutoGPTQ compiled when using a Docker
I've tried:
and
The second example worked before and it doesn't work now and I can't understand why.
The Docker template in question has 11.6 installed:
If I boot into the Docker and compile from the command line, it works fine:
In general I found AutoGPTQ seems to be very particular about whether or not it will build the CUDA kernel
Is there some command I can give to force it to build it? It would be really helpful.
Thanks very much