Blaizzy / mlx-vlm

MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.
MIT License
144 stars 12 forks source link

Can't run deepseek-vl with script on M2 #46

Closed WayneCui closed 4 days ago

WayneCui commented 5 days ago
import mlx.core as mx
from mlx_vlm import load, generate

model_path = "mlx-community/deepseek-vl-7b-chat-4bit"
model, processor = load(model_path)

prompt = processor.tokenizer.apply_chat_template(
    [{"role": "user", "content": f"<image>\nWhat are these?"}],
    tokenize=False,
    add_generation_prompt=True,
)

output = generate(model, processor, "http://images.cocodataset.org/val2017/000000039769.jpg", prompt, verbose=False)
print(output)

Traceback (most recent call last): File "/Users/wayne/2-learning/Projects/gpt/DeepSeek-VL/inference2.py", line 8, in prompt = processor.tokenizer.apply_chat_template( AttributeError: 'LlamaTokenizerFast' object has no attribute 'tokenizer'

>>> processor
LlamaTokenizerFast(name_or_path='/Users/wayne/.cache/huggingface/hub/models--mlx-community--deepseek-vl-7b-chat-4bit/snapshots/79feff56645faf5f145c834118ca3d43c8c55984', vocab_size=100000, model_max_length=16384, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<|begin▁of▁sentence|>', 'eos_token': '<|end▁of▁sentence|>', 'additional_special_tokens': ['<image>']}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
        100000: AddedToken("<|begin▁of▁sentence|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
        100001: AddedToken("<|end▁of▁sentence|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
        100002: AddedToken("ø", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100003: AddedToken("ö", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100004: AddedToken("ú", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100005: AddedToken("ÿ", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100006: AddedToken("õ", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100007: AddedToken("÷", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100008: AddedToken("û", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100009: AddedToken("ý", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100010: AddedToken("À", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100011: AddedToken("ù", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100012: AddedToken("Á", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100013: AddedToken("þ", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100014: AddedToken("ü", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100015: AddedToken("<image_placeholder>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        100016: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}
Blaizzy commented 5 days ago

Processor doesn't have a tokenizer attribute and it doesn't use newline in the prompt.

Try this:

import mlx.core as mx
from mlx_vlm import load, generate

model_path = "mlx-community/deepseek-vl-7b-chat-4bit"
model, processor = load(model_path)

prompt = processor.apply_chat_template(
    [{"role": "user", "content": f"<image>What are these?"}],
    tokenize=False,
    add_generation_prompt=True,
)

output = generate(model, processor, "http://images.cocodataset.org/val2017/000000039769.jpg", prompt, verbose=False)
print(output)
Blaizzy commented 5 days ago

Let me know how it goes

WayneCui commented 4 days ago

Processor doesn't have a tokenizer attribute and it doesn't use newline in the prompt.

Try this:

import mlx.core as mx
from mlx_vlm import load, generate

model_path = "mlx-community/deepseek-vl-7b-chat-4bit"
model, processor = load(model_path)

prompt = processor.apply_chat_template(
    [{"role": "user", "content": f"<image>What are these?"}],
    tokenize=False,
    add_generation_prompt=True,
)

output = generate(model, processor, "http://images.cocodataset.org/val2017/000000039769.jpg", prompt, verbose=False)
print(output)
(deepseek) ➜  DeepSeek-VL git:(main) ✗ python inference2.py 
Fetching 6 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 107088.61it/s]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
  File "/Users/wayne/2-learning/Projects/gpt/DeepSeek-VL/inference2.py", line 13, in <module>
    output = generate(model, processor, "http://images.cocodataset.org/val2017/000000039769.jpg", prompt, verbose=False)
  File "/Users/wayne/anaconda3/envs/deepseek/lib/python3.9/site-packages/mlx_vlm/utils.py", line 830, in generate
    prompt_tokens = mx.array(processor.tokenizer.encode(prompt))
AttributeError: 'LlamaTokenizerFast' object has no attribute 'tokenizer'

Thanks for your reply! Seems there is processor.tokenizer.encode in mlx_vlm/utils.py

Blaizzy commented 4 days ago

Hey @WayneCui

The image_preprocessor object was missing, this should work fine:

import mlx.core as mx
from mlx_vlm.utils import load, generate, load_image_processor

model_path = "mlx-community/deepseek-vl-7b-chat-4bit"
model, processor = load(model_path)
image_processor = load_image_processor(model_path)

prompt = processor.apply_chat_template(
    [{"role": "user", "content": f"<image>What are these?"}],
    tokenize=False,
    add_generation_prompt=True,
)

output = generate(
     model, 
     processor, 
     "http://images.cocodataset.org/val2017/000000039769.jpg", 
     prompt, 
     image_processor,  
     verbose=False
)
print(output)

The docs are coming soon with examples for all models and how to guides.

WayneCui commented 4 days ago

It works for me, thanks a lot!

Blaizzy commented 4 days ago

Most welcome;)