Blaizzy / mlx-vlm

MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.
MIT License
144 stars 12 forks source link

LLava documentation? #47

Open jrp2014 opened 2 days ago

jrp2014 commented 2 days ago

Running the script on the front page, I get:

config.json: 100%|███████████████████████████████████| 1.13k/1.13k [00:00<00:00, 4.87MB/s]
added_tokens.json: 100%|████████████████████████████████| 41.0/41.0 [00:00<00:00, 131kB/s]
special_tokens_map.json: 100%|███████████████████████████| 552/552 [00:00<00:00, 5.37MB/s]
preprocessor_config.json: 100%|██████████████████████████| 819/819 [00:00<00:00, 9.76MB/s]
model.safetensors.index.json: 100%|████████████████████| 129k/129k [00:00<00:00, 1.53MB/s]
tokenizer_config.json: 100%|█████████████████████████| 1.31k/1.31k [00:00<00:00, 9.20MB/s]
tokenizer.model: 100%|█████████████████████████████████| 500k/500k [00:00<00:00, 6.96MB/s]
tokenizer.json: 100%|████████████████████████████████| 1.84M/1.84M [00:00<00:00, 4.65MB/s]
model.safetensors: 100%|█████████████████████████████| 3.98G/3.98G [06:43<00:00, 9.85MB/s]
Fetching 9 files: 100%|█████████████████████████████████████| 9/9 [06:44<00:00, 44.89s/it]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
No chat template is set for this tokenizer, falling back to a default class-level template. This is very error-prone, because models are often trained with templates different from the class default! Default chat templates are a legacy feature and will be removed in Transformers v4.43, at which point any code depending on them will stop working. We recommend setting a valid chat template before then to ensure that this model continues working without issues.

Is this OK??

I assume that to run the lava v1.6 models I just look at the versions available on hugging face. Are there any further details on how much memory is required to run the 7B v 34B variants, and how much better the 8bit v 4bit takes are, please?

Blaizzy commented 2 days ago

Yes, you just need to pick any of the llava-1.6 on the mlx-community repo.

I don't know exactly the memory requirements because you have to factor image processing.

jrp2014 commented 2 days ago

The example, with "model_path = "mlx-community/llava-v1.6-34b-8bit" results in

The image you've provided appears to show a pair of shoes. However, the image is quite blurry and it's difficult to make out specific details. If you have any specific questions about shoes or need information related to them, feel free to ask!

rather than the two cat result expected, and produced by the v1.5 model, or

These are two cats lying on a pink blanket. The cat on the left appears to be a kitten with a striped coat, while the cat on the right is a larger cat with a tabby pattern. They seem to be resting or sleeping, and there are remote controls nearby, suggesting that they might be in a living room or a similar space where people relax and watch television. 

with "model_path = "mlx-community/llava-v1.6-mistral-7b-8bit""

Also, the python example on https://github.com/Blaizzy/mlx-vlm/tree/main/mlx_vlm/models/llava seems not to work out of the box as you get "ImportError: attempted relative import with no known parent package" diagnostics. There is no doubt a simple fix.

Blaizzy commented 2 days ago

Hey @jrp2014

I ran some tests, and I can't replicate the issue you presented.

I downloaded fresh copies of llava-v1.6-34b-4bit and 8bit and both provided the correct answer.

llava-v1.6-34b-4bit

Screenshot 2024-06-29 at 2 32 58 AM

llava-v1.6-34b-8bit

Screenshot 2024-06-29 at 10 04 25 AM

Also, the python example on https://github.com/Blaizzy/mlx-vlm/tree/main/mlx_vlm/models/llava seems not to work out of the box as you get "ImportError: attempted relative import with no known parent package" diagnostics. There is no doubt a simple fix.

What example in particular did you try that failed?

jrp2014 commented 2 days ago

Same example. On a 48Gb RAM M3 Max MacBook Pro. Ow do you clear the cache and get a new c of the model?

Blaizzy commented 1 day ago

How are you running it? Via CLI or a script?

jrp2014 commented 1 day ago

The python script from the front page. (The script on the Llava page doesn’t work for me (see above).

Blaizzy commented 1 day ago

Please run it again and share a screenshot of your terminal like the one I provided earlier :)

jrp2014 commented 1 day ago

The script that I am using is:

import mlx.core as mx
from mlx_vlm import load, generate

# model_path = "mlx-community/llava-1.5-7b-4bit"
#model_path = "mlx-community/llava-v1.6-mistral-7b-8bit"
model_path = "mlx-community/llava-v1.6-34b-8bit"
model, processor = load(model_path)

prompt = processor.tokenizer.apply_chat_template(
    [{"role": "user", "content": f"<image>\nProvide a caption and keywords for this image"}],
    tokenize=False,
    add_generation_prompt=True,
)

image = "/Users/jrp/Pictures/Aiarty Output/20240622-154844_DSC00820_DxO_photo_x1_8640x5760.jpeg"

output = generate(model, processor, image, prompt, verbose=True)

print(output)

The results that I get with the 7B model are:

(mlx) ➜  mlx_vlm git:(main) ✗ python mytest.py

Fetching 10 files: 100%|███████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 151418.92it/s]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
==========
Image: /Users/jrp/Pictures/Aiarty Output/20240622-154844_DSC00820_DxO_photo_x1_8640x5760.jpeg 

Prompt: <s>[INST] <image>
Provide a caption and keywords for this image [/INST]
Caption: "Exploring the ancient ruins of a castle, surrounded by nature and history."

Keywords: castle, ruins, history, architecture, nature, stone, brick, people, tourism, outdoor, green, grass, trees, path, walkway, medieval, heritage, visit, sightseeing, travel, landscape, stone wall, old, fortress, tourist attraction, historical site, group, visitation, leisure, outdoor activity, exploration, visitation

(which is pretty good). With the 34B model, I get:

Fetching 15 files: 100%|███████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 110376.42it/s]
==========
Image: /Users/jrp/Pictures/Aiarty Output/20240622-154844_DSC00820_DxO_photo_x1_8640x5760.jpeg 

Prompt: <|im_start|>user
<image>
Provide a caption and keywords for this image<|im_end|>
<|im_start|>assistant

Caption: A group of people standing on a beach.

Keywords: beach, people, group, standing, ocean, sand, shore, water, waves, horizon, sky, clouds, sun, weather, day, outdoor, leisure, vacation, travel, tourism, landscape, scenery, nature, environment, coastal, shoreline, seascape, seaside, coastal, shore, shoreline, seascape, seaside, coastal, shore, shoreline, seascape,

(which is is nothing like the image.)

It seems likely to be some sort of memory overflow, but it'd be better to say "out of memory" or whatever, than to err hallucinate. The only other thing I can think of is that I there were various diagnostics related to locks when I first downloaded the 34B model, but nothing since.

Blaizzy commented 1 day ago

Can you share the image you used? I will try and debug it.

jrp2014 commented 1 day ago

As described above, it's a similar result with the image in the front page example (which seems to run without problem in your context, from the above).

Blaizzy commented 1 day ago

I'm talking about this image:

Image: /Users/jrp/Pictures/Aiarty Output/20240622-154844_DSC00820_DxO_photo_x1_8640x5760.jpeg