casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
https://casper-hansen.github.io/AutoAWQ/
MIT License
1.74k stars 208 forks source link

DLL load failed while importing awq_inference_engine: The specified procedure could not be found. #124

Closed OriginalGoku closed 1 year ago

OriginalGoku commented 1 year ago

I followed the instructions to install AutoAWQ

Here is my code: `from transformers import AutoTokenizer from awq import AutoAWQForCausalLM

Load Model and Tokenizer

def load_model_tokenizer(): model_name_or_path = "TheBloke/Mistral-7B-OpenOrca-AWQ"

# Load model
model = AutoAWQForCausalLM.from_quantized(model_name_or_path, fuse_layers=True,
                                          trust_remote_code=False, safetensors=True)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=False)
return model, tokenizer

if name == 'main': model, tokenizer = load_model_tokenizer()

system_message = 'You are an expert in English Language. Rewrite the following text in short sentences: '
prompt = 'There used to be a good LLM that 384u8 listened to instructions very asduh2 $#@ well'
prompt_template = f'''<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

'''
print("\n\n*** Generate:")

tokens = tokenizer(
    prompt_template,
    return_tensors='pt'
).input_ids.cuda()

prompt_tokens = tokenizer(
    prompt,
    return_tensors='pt'
).input_ids.cuda()
len_token = len(prompt_tokens[0])
print(f'Prompt has {len_token} tokens')
# Generate output
generation_output = model.generate(
    tokens,
    do_sample=True,
    temperature=0.001,
    top_p=0.80,
    top_k=10,
    max_new_tokens=len_token
)

print("Output: ", tokenizer.decode(generation_output[0]))

`

and here is the error message:

Traceback (most recent call last): File "C:\Users\User\PycharmProjects\Mistral7B\main.py", line 4, in <module> from awq import AutoAWQForCausalLM File "C:\Users\User\PycharmProjects\Mistral7B\venv\lib\site-packages\awq\__init__.py", line 2, in <module> from awq.models.auto import AutoAWQForCausalLM File "C:\Users\User\PycharmProjects\Mistral7B\venv\lib\site-packages\awq\models\__init__.py", line 1, in <module> from .mpt import MptAWQForCausalLM File "C:\Users\User\PycharmProjects\Mistral7B\venv\lib\site-packages\awq\models\mpt.py", line 1, in <module> from .base import BaseAWQForCausalLM File "C:\Users\User\PycharmProjects\Mistral7B\venv\lib\site-packages\awq\models\base.py", line 11, in <module> from awq.quantize.quantizer import AwqQuantizer File "C:\Users\User\PycharmProjects\Mistral7B\venv\lib\site-packages\awq\quantize\quantizer.py", line 11, in <module> from awq.modules.linear import WQLinear_GEMM, WQLinear_GEMV File "C:\Users\User\PycharmProjects\Mistral7B\venv\lib\site-packages\awq\modules\linear.py", line 4, in <module> import awq_inference_engine # with CUDA kernels ImportError: DLL load failed while importing awq_inference_engine: The specified procedure could not be found.

I am using torch version 2.1.0 and CUDA 11.8 I have an NVIDIA GeForce GTX 1070 with 32GB GPU RAM (8 GB Dedicated and 24GB Shared GPU Ram)

casper-hansen commented 1 year ago

Your GPU is not supported, so the install probably failed

OriginalGoku commented 1 year ago

@casper-hansen Thanks for your prompt reply. You are right. When I tried to build the code using git, it gave me the following error message: RuntimeError: GPUs with compute capability less than 7.5 are not supported. but when I tried to install it using pip install i did not get any error messages so initially I thought everything was installed properly.