Example in Readme doesn't work

Blaizzy / mlx-vlm

MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.

MIT License

468 stars 35 forks source link

Example in Readme doesn't work #25

Closed kechan closed 5 months ago

kechan commented 5 months ago

mlx 0.13.1 mlx-lm 0.13.1 mlx-vlm 0.0.5

import mlx.core as mx
from mlx_vlm import load, generate

model_path = "mlx-community/llava-1.5-7b-4bit"
model, processor = load(model_path)

prompt = processor.apply_chat_template(
    [{"role": "user", "content": f"<image>\nWhat are these?"}],
    tokenize=False,
    add_generation_prompt=True,
)

output = generate(model, processor, "http://images.cocodataset.org/val2017/000000039769.jpg", prompt, verbose=False)

Got error:

AttributeError Traceback (most recent call last) Cell In[44], line 7 4 model_path = "mlx-community/llava-1.5-7b-4bit" 5 model, processor = load(model_path) ----> 7 prompt = processor.apply_chat_template( 8 [{"role": "user", "content": f"\nWhat are these?"}], 9 tokenize=False, 10 add_generation_prompt=True, 11 ) 13 output = generate(model, processor, "http://images.cocodataset.org/val2017/000000039769.jpg", prompt, verbose=False)

AttributeError: 'LlavaProcessor' object has no attribute 'apply_chat_template'

Blaizzy commented 5 months ago

Hi @kechan

This error is happening because Llava has the tokenizer as a attribute of the processor class.

Therefore, you can apply_chat_template this way:

prompt = processor.tokenizer.apply_chat_template([{"role": "user", "content": f"\nWhat are these?"}],
tokenize=False,
add_generation_prompt=True,
)

kechan commented 5 months ago

I thought I also tried something like that but got another error:

ValueError: Shapes (1,577,1024) and (577,128) cannot be broadcast.

Blaizzy commented 5 months ago

Could you install from source?

The fix for this issue is on this branch:

https://github.com/Blaizzy/mlx-vlm/tree/pc/quantise-irregular

Just clone it and:

pip install -e .

kechan commented 5 months ago

ok, I forgot to switch branch and tried again, and it worked. now, I got:

Image: http://images.cocodataset.org/val2017/000000039769.jpg

Prompt: ~~[INST] What are these? [/INST]~~

~~These are two cats, one black and white and the other brown and white, lying on a couch.~~

kechan commented 5 months ago

I was trying some code previously to try to get it work, and I discovered another weird thing. will log separately.

Blaizzy commented 5 months ago

Awesome! Is it working now?

This issue is happening because I changed the quantisation.

But that release is blocked as I'm still working on getting Google's Paligemma support 👌🏽

kechan commented 5 months ago

Yes, got it to work. I am looking forward to try Paligemma next. Thanks for all the work.

Blaizzy commented 5 months ago

Great, most welcome!

© Githubissues.

Githubissues is a development platform for aggregating issues.