AutoGPTQ in this image fails inside docker container

Hello; I was testing out using docker images to try out runpod's serverless offerings. I copied over the dockerfiles included in this repo & was faced with the following error.

2023-08-16T07:12:28.447362752Z ==========
2023-08-16T07:12:28.447374292Z == CUDA ==
2023-08-16T07:12:28.447573083Z ==========
2023-08-16T07:12:28.451864455Z 
2023-08-16T07:12:28.453468220Z 
2023-08-16T07:12:28.453485180Z Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2023-08-16T07:12:28.455098515Z 
2023-08-16T07:12:28.455111075Z This container image and its contents are governed by the NVIDIA Deep Learning Container License.
2023-08-16T07:12:28.455115174Z By pulling and using the container, you accept the terms and conditions of this license:
2023-08-16T07:12:28.455146764Z https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
2023-08-16T07:12:28.455150515Z 
2023-08-16T07:12:28.455153164Z A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
2023-08-16T07:12:28.468523023Z 
2023-08-16T07:12:29.809130792Z 11.8
2023-08-16T07:12:29.828168608Z cuda is available
2023-08-16T07:12:29.839421920Z number of devices: 1
2023-08-16T07:12:29.964785713Z INFO   | RUNPOD_AI_API_KEY: S**************************************O
2023-08-16T07:12:29.964818754Z INFO   | RUNPOD_WEBHOOK_GET_JOB: h**********************************************************************************0
2023-08-16T07:12:29.964823124Z INFO   | RUNPOD_WEBHOOK_POST_OUTPUT: h**************************************************************************************0
2023-08-16T07:12:32.198523144Z instantiator.py     :21   2023-08-16 07:12:32,198 Created a temporary directory at /tmp/tmp6pdpaztr
2023-08-16T07:12:32.198730885Z instantiator.py     :76   2023-08-16 07:12:32,198 Writing /tmp/tmp6pdpaztr/_remote_module_non_scriptable.py
2023-08-16T07:12:37.330701623Z _base.py            :746  2023-08-16 07:12:37,330 lm_head not been quantized, will be ignored when make_quant.
2023-08-16T07:12:42.316552297Z modeling.py         :1093 2023-08-16 07:12:42,316 The safetensors archive passed at /root/.cache/huggingface/hub/models--TheBloke--Llama-2-7B-GPTQ/snapshots/98ffa0d89723ce1e3214f477469b2db67c6c4586/gptq_model-4bit-128g.safetensors does not contain metadata. Make sure to save your model with the `save_pretrained` method. Defaulting to 'pt' metadata.
2023-08-16T07:12:42.663930515Z Traceback (most recent call last):
2023-08-16T07:12:42.663962315Z   File "/app/./handler.py", line 24, in <module>
2023-08-16T07:12:42.664019434Z     model = AutoGPTQForCausalLM.from_quantized(quantized_model_dir, device="cuda", use_safetensors=True, use_triton=False, model_basename=model_basename)
2023-08-16T07:12:42.664041064Z   File "/usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/auto.py", line 94, in from_quantized
2023-08-16T07:12:42.664088935Z     return quant_func(
2023-08-16T07:12:42.664096455Z   File "/usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/_base.py", line 793, in from_quantized
2023-08-16T07:12:42.664405756Z     accelerate.utils.modeling.load_checkpoint_in_model(
2023-08-16T07:12:42.664419257Z   File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py", line 1336, in load_checkpoint_in_model
2023-08-16T07:12:42.664706737Z     set_module_tensor_to_device(
2023-08-16T07:12:42.664718026Z   File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py", line 286, in set_module_tensor_to_device
2023-08-16T07:12:42.664796557Z     and torch.device(device).type == "cuda"
2023-08-16T07:12:42.664875247Z TypeError: Device() received an invalid combination of arguments - got (NoneType), but expected one of:
2023-08-16T07:12:42.664880427Z  * (torch.device device)

The above error occurs on this image with autogptq (I will update with more details later -- was interrupted while typing) This is my dockerfile:

ARG CUDA_VERSION="11.8.0"
ARG CUDNN_VERSION="8"
ARG UBUNTU_VERSION="22.04"
ARG DOCKER_FROM=thebloke/cuda$CUDA_VERSION-ubuntu$UBUNTU_VERSION-pytorch:latest 

# Base pytorch image
FROM $DOCKER_FROM as base
WORKDIR /app
COPY . /app
# install auto_gptq
ARG AUTOGPTQ="0.3.0"
ENV CUDA_VERSION=""
ENV GITHUB_ACTIONS=true
ENV TORCH_CUDA_ARCH_LIST="8.0;8.6+PTX;8.9;9.0" 
RUN pip3 install --no-cache-dir auto-gptq==$AUTOGPTQ
RUN pip3 install -r requirements.txt ##### This installs the runpod library as well as transformers==4.31.0 #####

CMD [ "python3", "-u", "./handler.py" ] ##### test handler python file #####

I have posted in the discord channel as well as in the autogptq repo. It would be great to find out if I'm doing something wrong! Thank you very much in advance.

TheBlokeAI / dockerLLM

AutoGPTQ in this image fails inside docker container #5