Blaizzy / mlx-vlm

MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.
MIT License
144 stars 12 forks source link

Can't run deepseek-vl with script on M2 #46

Closed WayneCui closed 4 days ago

WayneCui commented 5 days ago
import mlx.core as mx
from mlx_vlm import load, generate

model_path = "mlx-community/deepseek-vl-7b-chat-4bit"
model, processor = load(model_path)

prompt = processor.tokenizer.apply_chat_template(
    [{"role": "user", "content": f"<image>\nWhat are these?"}],

output = generate(model, processor, "", prompt, verbose=False)

Traceback (most recent call last): File "/Users/wayne/2-learning/Projects/gpt/DeepSeek-VL/", line 8, in prompt = processor.tokenizer.apply_chat_template( AttributeError: 'LlamaTokenizerFast' object has no attribute 'tokenizer'

>>> processor
LlamaTokenizerFast(name_or_path='/Users/wayne/.cache/huggingface/hub/models--mlx-community--deepseek-vl-7b-chat-4bit/snapshots/79feff56645faf5f145c834118ca3d43c8c55984', vocab_size=100000, model_max_length=16384, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<|begin▁of▁sentence|>', 'eos_token': '<|end▁of▁sentence|>', 'additional_special_tokens': ['<image>']}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
        100000: AddedToken("<|begin▁of▁sentence|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
        100001: AddedToken("<|end▁of▁sentence|>", rstrip=False, lstrip=False, single_word=False, normalized=True, special=True),
        100002: AddedToken("ø", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100003: AddedToken("ö", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100004: AddedToken("ú", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100005: AddedToken("ÿ", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100006: AddedToken("õ", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100007: AddedToken("÷", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100008: AddedToken("û", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100009: AddedToken("ý", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100010: AddedToken("À", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100011: AddedToken("ù", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100012: AddedToken("Á", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100013: AddedToken("þ", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100014: AddedToken("ü", rstrip=False, lstrip=False, single_word=False, normalized=True, special=False),
        100015: AddedToken("<image_placeholder>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
        100016: AddedToken("<image>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
Blaizzy commented 5 days ago

Processor doesn't have a tokenizer attribute and it doesn't use newline in the prompt.

Try this:

import mlx.core as mx
from mlx_vlm import load, generate

model_path = "mlx-community/deepseek-vl-7b-chat-4bit"
model, processor = load(model_path)

prompt = processor.apply_chat_template(
    [{"role": "user", "content": f"<image>What are these?"}],

output = generate(model, processor, "", prompt, verbose=False)
Blaizzy commented 5 days ago

Let me know how it goes

WayneCui commented 4 days ago

Processor doesn't have a tokenizer attribute and it doesn't use newline in the prompt.

Try this:

import mlx.core as mx
from mlx_vlm import load, generate

model_path = "mlx-community/deepseek-vl-7b-chat-4bit"
model, processor = load(model_path)

prompt = processor.apply_chat_template(
    [{"role": "user", "content": f"<image>What are these?"}],

output = generate(model, processor, "", prompt, verbose=False)
(deepseek) ➜  DeepSeek-VL git:(main) ✗ python 
Fetching 6 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 107088.61it/s]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
  File "/Users/wayne/2-learning/Projects/gpt/DeepSeek-VL/", line 13, in <module>
    output = generate(model, processor, "", prompt, verbose=False)
  File "/Users/wayne/anaconda3/envs/deepseek/lib/python3.9/site-packages/mlx_vlm/", line 830, in generate
    prompt_tokens = mx.array(processor.tokenizer.encode(prompt))
AttributeError: 'LlamaTokenizerFast' object has no attribute 'tokenizer'

Thanks for your reply! Seems there is processor.tokenizer.encode in mlx_vlm/

Blaizzy commented 4 days ago

Hey @WayneCui

The image_preprocessor object was missing, this should work fine:

import mlx.core as mx
from mlx_vlm.utils import load, generate, load_image_processor

model_path = "mlx-community/deepseek-vl-7b-chat-4bit"
model, processor = load(model_path)
image_processor = load_image_processor(model_path)

prompt = processor.apply_chat_template(
    [{"role": "user", "content": f"<image>What are these?"}],

output = generate(

The docs are coming soon with examples for all models and how to guides.

WayneCui commented 4 days ago

It works for me, thanks a lot!

Blaizzy commented 4 days ago

Most welcome;)