Closed Lancelottery closed 1 month ago
I fixed this by manually setting torch_dtype=torch.float16 in utils_model.py. I also fixed the attention weights warning by wrapping the model and setting the output_attentions=True.
def forward_with_attentions(module):
original_forward = module.forward
def new_forward(*args, **kwargs):
kwargs['output_attentions'] = True
return original_forward(*args, **kwargs)
module.forward = new_forward
def get_processor_model(args):
processor = AutoProcessor.from_pretrained(args.model_name_or_path)
if args.load_4bit:
quant_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.float16
)
elif args.load_8bit:
quant_config = BitsAndBytesConfig(
load_in_8bit=True
)
else:
quant_config = None
model = LlavaForConditionalGeneration.from_pretrained(
args.model_name_or_path, torch_dtype=torch.float16,
quantization_config=quant_config, low_cpu_mem_usage=True, device_map="auto"
)
# Wrap vision model's forward function
for layer in model.vision_tower.vision_model.encoder.layers:
forward_with_attentions(layer.self_attn)
.....
Description
I encountered an issue while running the LLAVA-Gemma model on my local machine. The error message indicates that the "triu_tril_cuda_template" is not implemented for the 'BFloat16' data type.
Error Message
Here is the complete error trace:
Environment
Steps to Reproduce
python app.py --model_name_or_path intel-llava-gemma-2b --load_8bit --port 8080
Additional Information
lvlm-interpret/venv/lib/python3.11/site-packages/bitsandbytes/autograd/_functions.py:316: UserWarning: MatMul8bitLt: inputs will be cast from torch.bfloat16 to float16 during quantization warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization") WARNING:utils_model:Attention weights were not returned for the vision model. Relevancy maps will not be calculated for the vision model. To enable, set output_attentions=True in the forward pass of vision_tower.
p.s. Do I need to worry about the "Attention weights were not returned for the vision mode" warning? I was using the Intel model, so I thought everything should be set as default?
Thank you so much for the excellent work! I am looking forward to the reply!