Blaizzy / mlx-vlm

MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.
MIT License
542 stars 50 forks source link

AttributeError: 'Qwen2TokenizerFast' object has no attribute 'tokenizer'. Did you mean: '_tokenizer'? #135

Open jrp2014 opened 1 day ago

jrp2014 commented 1 day ago

Using the latest versions of mlx and mlx_vlm, on

import mlx.core as mx
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

from PIL import Image

import os
from pathlib import Path

# model_path = "mlx-community/llava-1.5-7b-4bit"
# model_path = "mlx-community/llava-v1.6-mistral-7b-8bit"
# model_path = "mlx-community/pixtral-12b-8bit" # To the point
# model_path = "Qwen/Qwen2-VL-7B-Instruct"  # libc++abi: terminating due to uncaught exception of type std::runtime_error: Attempting to allocate 269535412224 bytes which is greater than the maximum allowed buffer size of 28991029248 bytes.###
# model_path = "mlx-community/llava-v1.6-34b-8bit" # Slower but more precise
# model_path = "mlx-community/Phi-3.5-vision-instruct-bf16" # OK, but doesn't provide keywords
# model_path = "mistral-community/pixtral-12b"
# model_path = "meta-llama/Llama-3.2-11B-Vision-Instruct"  # needs about 95Gb, but is slow
# model_path ="mlx-community/Qwen2-VL-72B-Instruct-8bit" # libc++abi: terminating due to uncaught exception of type std::runtime_error: Attempting to allocate 135383101952 bytes which is greater than the maximum allowed buffer size of 77309411328 bytes.
model_path ="mlx-community/dolphin-vision-72b-4bit"

print("Model: ", model_path)

# Load the model
model, processor = load(model_path)
config = load_config(model_path)

prompt = "Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily"

picpath = "/Users/xxx/Pictures/Processed"
pics = sorted(Path(picpath).iterdir(), key=os.path.getmtime, reverse=True)
pic = str(pics[0])
print("Image: ", pic)

# Apply chat template
formatted_prompt = apply_chat_template(processor, config, prompt, num_images=1)

# Generate output
output = generate(model, processor, pic, formatted_prompt, max_tokens=500, verbose=True)
print(output)

I get:

>  python mytest.py
(mlx) ~/Documents/AI/mlx/scripts/vlm % python mytest.py
Model:  mlx-community/dolphin-vision-72b-4bit
Fetching 19 files: 100%|█████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 23570.48it/s]
The repository for /Users/xxx/.cache/huggingface/hub/models--mlx-community--dolphin-vision-72b-4bit/snapshots/82156979ae25603e5d1bbec346559fe27d279f22 contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co//Users/xxx/.cache/huggingface/hub/models--mlx-community--dolphin-vision-72b-4bit/snapshots/82156979ae25603e5d1bbec346559fe27d279f22.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y
Fetching 19 files: 100%|█████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 13077.09it/s]
Image:  /Users/xxx/Pictures/Processed/20241123-231118_DSC02850_DxO.jpg
==========
Image: /Users/xxx/Pictures/Processed/20241123-231118_DSC02850_DxO.jpg 

Prompt: <|im_start|>system
Answer the questions.<|im_end|><|im_start|>user
<image>
Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily<|im_end|><|im_start|>assistant

Traceback (most recent call last):
  File "/Users/xxx/Documents/AI/mlx/scripts/vlm/mytest.py", line 39, in <module>
    output = generate(model, processor, pic, formatted_prompt, max_tokens=500, verbose=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/mlx_vlm/utils.py", line 1181, in generate
    prompt_tokens = mx.array(processor.tokenizer.encode(prompt))
                             ^^^^^^^^^^^^^^^^^^^
AttributeError: 'Qwen2TokenizerFast' object has no attribute 'tokenizer'. Did you mean: '_tokenizer'?
Blaizzy commented 23 hours ago

Try updating your transformers to the latest as well

jrp2014 commented 21 hours ago

yes, I think that I have all the latest via a pip install -U -r requirements.txt (transformers 4.46.3)

The full list is

Package            Version
------------------ ----------------------------
accelerate         1.1.1
aiofiles           23.2.1
aiohappyeyeballs   2.4.3
aiohttp            3.11.2
aiosignal          1.3.1
annotated-types    0.7.0
anyio              4.6.2.post1
attrs              24.2.0
certifi            2024.8.30
charset-normalizer 3.4.0
click              8.1.7
cmake              3.31.1
datasets           3.1.0
dill               0.3.8
fastapi            0.115.5
ffmpy              0.4.0
filelock           3.16.1
frozenlist         1.5.0
fsspec             2024.9.0
gradio             5.7.1
gradio_client      1.5.0
h11                0.14.0
hf_transfer        0.1.8
httpcore           1.0.7
httpx              0.27.2
huggingface-hub    0.26.2
idna               3.10
inquirerpy         0.3.4
Jinja2             3.1.4
llvmlite           0.43.0
markdown-it-py     3.0.0
MarkupSafe         2.1.5
mdurl              0.1.2
mlx                0.21.0.dev20241128+974bb54ab
mlx-lm             0.20.1
mlx-vlm            0.1.3
mlx-whisper        0.4.1
more-itertools     10.5.0
mpmath             1.3.0
multidict          6.1.0
multiprocess       0.70.16
nanobind           2.2.0
networkx           3.4.2
numba              0.60.0
numpy              1.26.4
orjson             3.10.11
packaging          24.2
pandas             2.2.3
pfzy               0.3.4
pillow             11.0.0
pip                24.3.1
prompt_toolkit     3.0.48
propcache          0.2.0
protobuf           5.28.3
psutil             6.1.0
pyarrow            18.0.0
pydantic           2.9.2
pydantic_core      2.23.4
pydub              0.25.1
Pygments           2.18.0
python-dateutil    2.9.0.post0
python-multipart   0.0.12
pytz               2024.2
PyYAML             6.0.2
regex              2024.11.6
requests           2.32.3
rich               13.9.4
ruff               0.7.4
safehttpx          0.1.1
safetensors        0.4.5
scipy              1.13.1
semantic-version   2.10.0
sentencepiece      0.2.0
setuptools         75.6.0
shellingham        1.5.4
six                1.16.0
sniffio            1.3.1
starlette          0.41.2
sympy              1.13.1
tiktoken           0.8.0
tokenizers         0.20.3
tomlkit            0.12.0
torch              2.5.1
torchaudio         2.5.1
torchvision        0.20.1
tqdm               4.67.1
tqdn               0.2.1
transformers       4.46.3
typer              0.13.0
typing_extensions  4.12.2
tzdata             2024.2
urllib3            2.2.3
uvicorn            0.32.0
wcwidth            0.2.13
websockets         12.0
wheel              0.44.0
xxhash             3.5.0
yarl               1.17.1
Blaizzy commented 18 hours ago

I see,

Dolphin like NanoLLaVA use image_processor :)

Here is an example: https://github.com/Blaizzy/mlx-vlm/blob/595c1f0676066fe348b14b2fc8edfdb7607f812d/mlx_vlm/generate.py#L71

Blaizzy commented 18 hours ago

I will probably simplify this on a future release by adding it as a attribute in the processor at load time 👌🏽

So we can avoid this type of issues.

jrp2014 commented 17 hours ago

Thanks. I'm not sure how I was supposed to know that, and I'm still not sure what I was supposed to know. I don't use nanolava when I can use the full fat version.

I was kind of hoping that whatever needed to be done was done under the hood... particularly in the absence of documentation to the contrary.

Blaizzy commented 17 hours ago

My bad!

I promise, I'm working on it :)