AceGpt - Githubissues

Mohamed-Maher5 commented 4 months ago

Hi. i have a issue in loading the model which is cause this error: WARNING:auto_gptq.nn_modules.fused_llama_mlp:skip module injection for FusedLlamaMLPForQuantizedModel not support integrate without triton yet.

RuntimeError Traceback (most recent call last) in <cell line: 1>() ----> 1 model = AutoGPTQForCausalLM.from_quantized(model_id,use_safetensors=False) 2 tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side="right", use_fast=False)

2 frames /usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/_utils.py in autogptq_post_init(model, use_act_order, max_input_length) 256 257 for device, buffers in model.device_to_buffers.items(): --> 258 prepare_buffers(device, buffers["temp_state"], buffers["temp_dq"]) 259 260 # Using the default from exllama repo here.

RuntimeError: no device index

hhwer commented 4 months ago

Hi, Which code did you use? Have you looked at our readme about GPTQ model in HF?

Mohamed-Maher5 commented 4 months ago

i'm working on colab and i simply want a code i can run to test the model with my prompt without errors in AWQ and i also want to note i use this code to avoid the promblem of the GPU: bnb_config = transformers.BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=bfloat16 )

hhwer commented 4 months ago

please refer to the previous issue

Mohamed-Maher5 commented 4 months ago

okay in the previes code when i run this code: model = AutoGPTQForCausalLM.from_quantized(model_id,use_safetensors=False) tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side="right", use_fast=False)

it result this error: WARNING:auto_gptq.nn_modules.fused_llama_mlp:skip module injection for FusedLlamaMLPForQuantizedModel not support integrate without triton yet.

RuntimeError Traceback (most recent call last) in <cell line: 1>() ----> 1 model = AutoGPTQForCausalLM.from_quantized(model_id,use_safetensors=False) 2 tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side="right", use_fast=False)

2 frames /usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/_utils.py in autogptq_post_init(model, use_act_order, max_input_length) 256 257 for device, buffers in model.device_to_buffers.items(): --> 258 prepare_buffers(device, buffers["temp_state"], buffers["temp_dq"]) 259 260 # Using the default from exllama repo here.

RuntimeError: no device index

hhwer commented 4 months ago

Is the version of all Python packages the same as in the given example?

FreedomIntelligence / AceGPT

AceGpt #11

Hi. i have a issue in loading the model which is cause this error: WARNING:auto_gptq.nn_modules.fused_llama_mlp:skip module injection for FusedLlamaMLPForQuantizedModel not support integrate without triton yet.

it result this error: WARNING:auto_gptq.nn_modules.fused_llama_mlp:skip module injection for FusedLlamaMLPForQuantizedModel not support integrate without triton yet.