casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
https://casper-hansen.github.io/AutoAWQ/
MIT License
1.67k stars 202 forks source link

AttributeError: 'LlavaForConditionalGeneration' object has no attribute 'quantize' #436

Open kzleong opened 5 months ago

kzleong commented 5 months ago

Hi @casper-hansen, I keep getting this error when trying to quantize my custom LLaVA model:

  File "/mainfs/lyceum/kzl1m20/LLaVA/quant.py", line 9, in <module>
    model = AutoAWQForCausalLM.from_pretrained(model_path, safetensors=True)
  File "/lyceum/kzl1m20/miniconda3/envs/llava/lib/python3.10/site-packages/awq/models/auto.py", line 60, in from_pretrained
    return AWQ_CAUSAL_LM_MODEL_MAP[model_type].from_pretrained(
  File "/lyceum/kzl1m20/miniconda3/envs/llava/lib/python3.10/site-packages/awq/models/base.py", line 311, in from_pretrained
    processor = AutoProcessor.from_pretrained(model_weights_path)
  File "/lyceum/kzl1m20/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/models/auto/processing_auto.py", line 312, in from_pretrained
    return processor_class.from_pretrained(
  File "/lyceum/kzl1m20/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/processing_utils.py", line 465, in from_pretrained
    args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, **kwargs)
  File "/lyceum/kzl1m20/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/processing_utils.py", line 511, in _get_arguments_from_pretrained
    args.append(attribute_class.from_pretrained(pretrained_model_name_or_path, **kwargs))
  File "/lyceum/kzl1m20/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2048, in from_pretrained
    return cls._from_pretrained(
  File "/lyceum/kzl1m20/miniconda3/envs/llava/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2232, in _from_pretrained
    value = AddedToken(**value, special=True)
TypeError: tokenizers.AddedToken() got multiple values for keyword argument 'special'

The script (quant.py) I'm using is below:

from transformers import AutoTokenizer

model_path = "/scratch/kzl1m20/llava-1.5-13b-posture"
quant_path = "../llava-1.5-13b-posture-awq"
quant_config = {"zero_point": True, "q_group_size": 128, "w_bit": 4, "version":"GEMM"}

model = AutoAWQForCausalLM.from_pretrained(model_path, safetensors=True)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

model.quantize(tokenizer, quant_config=quant_config)

model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

I'm using the following packages: autoawq=0.2.4 transformers=4.38.2 tokenziers=0.15.2

suparious commented 5 months ago

What is the architecture of your custom model? Do you know if it is already supported by AWQ?

kzleong commented 5 months ago

I'm trying to quantize a LLaVA 1.5 13b model finetuned with LoRA, and I've already converted the bin files to safetensors

kzleong commented 5 months ago

The error seems to be solved, it was an issue in my special_tokens_map.json where for some reason the special keyword was already declared, so I just removed it

However now I have the following error with model.quantize(tokenizer, quant_config=quant_config)

Traceback (most recent call last):
  File "/mainfs/lyceum/kzl1m20/LLaVA/quant.py", line 13, in <module>
    model.quantize(tokenizer, quant_config=quant_config)
  File "/lyceum/kzl1m20/miniconda3/envs/llava/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'LlavaForConditionalGeneration' object has no attribute 'quantize'

Any ideas why? I was able to get quantized weights for this model using llm-awq however I want a quantized model as others have using AutoAWQ

linjianshu commented 1 month ago

hey bro, did you figure out this error? I met the same output with AttributeError: 'LlamaForCausalLM' object has no attribute 'quantize' my code is under: `from awq import AutoAWQForCausalLM from transformers import AutoTokenizer import torch

model_path = '/home/evgpu/LLM_quantize/AutoCoder_S_6.7B' quant_path = '/home/evgpu/LLM_quantize/AutoCoder_S_6.7B-AWQ-4B' quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

Load model

model = AutoAWQForCausalLM.from_pretrained(model_path,device_map='cuda').to(device)

tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

Quantize

model.quantize(tokenizer, quant_config=quant_config)

Save quantized model

model.save_quantized(quant_path) tokenizer.save_pretrained(quant_path) `