Blaizzy / mlx-vlm

MLX-VLM is a package for running Vision LLMs locally on your Mac using MLX.
MIT License
180 stars 15 forks source link

Chat template error with MLX Community LLava models (moved from FastMLX) #51

Closed stewartugelow closed 1 month ago

stewartugelow commented 1 month ago

Continued from: https://github.com/Blaizzy/fastmlx/issues/6


When I tried this at the command line: "python -m mlx_vlm.chat_ui --model mlx-community/llava-1.5-7b-4bit", I get the same chat template errors with all of the following:

models--mlx-community--llava-1.5-7b-4bit models--mlx-community--llava-llama-3-8b-v1_1-8bit models--mlx-community--llava-phi-3-mini-4bit models--mlx-community--llava-v1.6-mistral-7b-8bit


Logs:

mlx-community/llava-1.5-7b-4bit

(rbuild) (base) Stewarts-MacBook-Pro:vmlx stewart$ python -m mlx_vlm.chat_ui --model mlx-community/llava-1.5-7b-4bit
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Fetching 9 files: 100%|████████████████████████| 9/9 [00:00<00:00, 88820.56it/s]
Fetching 9 files: 100%|████████████████████████| 9/9 [00:00<00:00, 30740.01it/s]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Fetching 9 files: 100%|████████████████████████| 9/9 [00:00<00:00, 68759.08it/s]
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/queueing.py", line 541, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/blocks.py", line 1928, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/blocks.py", line 1526, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 657, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 783, in asyncgen_wrapper
    response = await iterator.__anext__()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/chat_interface.py", line 592, in _stream_fn
    first_response = await async_iteration(generator)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 657, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 650, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 633, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/mlx_vlm/chat_ui.py", line 103, in chat
    messages = processor.apply_chat_template(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/transformers/processing_utils.py", line 926, in apply_chat_template
    raise ValueError(
ValueError: No chat template is set for this processor. Please either set the `chat_template` attribute, or provide a chat template as an argument. See https://huggingface.co/docs/transformers/main/en/chat_templating for more information.

mlx-community/llava-v1.6-mistral-7b-8bit

(rbuild) (base) Stewarts-MacBook-Pro:vmlx stewart$ python -m mlx_vlm.chat_ui --model mlx-community/llava-v1.6-mistral-7b-8bit
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Fetching 10 files: 100%|████████████████████| 10/10 [00:00<00:00, 110960.42it/s]
Fetching 10 files: 100%|█████████████████████| 10/10 [00:00<00:00, 34865.37it/s]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Fetching 10 files: 100%|████████████████████| 10/10 [00:00<00:00, 108942.96it/s]
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/queueing.py", line 541, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/blocks.py", line 1928, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/blocks.py", line 1526, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 657, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 783, in asyncgen_wrapper
    response = await iterator.__anext__()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/chat_interface.py", line 592, in _stream_fn
    first_response = await async_iteration(generator)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 657, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 650, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 633, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/mlx_vlm/chat_ui.py", line 103, in chat
    messages = processor.apply_chat_template(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/transformers/processing_utils.py", line 926, in apply_chat_template
    raise ValueError(
ValueError: No chat template is set for this processor. Please either set the `chat_template` attribute, or provide a chat template as an argument. See https://huggingface.co/docs/transformers/main/en/chat_templating for more information.
^CKeyboard interruption in main thread... closing server.

mlx-community/llava-llama-3-8b-v1_1-8bit

(rbuild) (base) Stewarts-MacBook-Pro:vmlx stewart$ python -m mlx_vlm.chat_ui --model mlx-community/llava-llama-3-8b-v1_1-8bit
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
Fetching 8 files: 100%|████████████████████████| 8/8 [00:00<00:00, 74731.47it/s]
Fetching 8 files: 100%|█████████████████████████| 8/8 [00:00<00:00, 9742.87it/s]
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Fetching 8 files: 100%|████████████████████████| 8/8 [00:00<00:00, 34344.35it/s]
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/queueing.py", line 541, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/blocks.py", line 1928, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/blocks.py", line 1526, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 657, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 783, in asyncgen_wrapper
    response = await iterator.__anext__()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/chat_interface.py", line 592, in _stream_fn
    first_response = await async_iteration(generator)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 657, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 650, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 633, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/mlx_vlm/chat_ui.py", line 103, in chat
    messages = processor.apply_chat_template(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/transformers/processing_utils.py", line 926, in apply_chat_template
    raise ValueError(
ValueError: No chat template is set for this processor. Please either set the `chat_template` attribute, or provide a chat template as an argument. See https://huggingface.co/docs/transformers/main/en/chat_templating for more information.
^CKeyboard interruption in main thread... closing server.

mlx-community/llava-phi-3-mini-4bit

(rbuild) (base) Stewarts-MacBook-Pro:vmlx stewart$ python -m mlx_vlm.chat_ui --model mlx-community/llava-phi-3-mini-4bit
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
preprocessor_config.json: 100%|████████████████| 819/819 [00:00<00:00, 8.36MB/s]
added_tokens.json: 100%|███████████████████████| 978/978 [00:00<00:00, 19.6MB/s]
config.json: 100%|█████████████████████████| 1.33k/1.33k [00:00<00:00, 18.7MB/s]
special_tokens_map.json: 100%|█████████████████| 615/615 [00:00<00:00, 3.51MB/s]
model.safetensors.index.json: 100%|██████████| 129k/129k [00:00<00:00, 9.68MB/s]
tokenizer_config.json: 100%|███████████████| 8.45k/8.45k [00:00<00:00, 46.1MB/s]
tokenizer.model: 100%|███████████████████████| 500k/500k [00:00<00:00, 12.5MB/s]
tokenizer.json: 100%|██████████████████████| 1.85M/1.85M [00:00<00:00, 8.59MB/s]
model.safetensors: 100%|███████████████████| 2.47G/2.47G [00:57<00:00, 43.2MB/s]
Fetching 9 files: 100%|███████████████████████████| 9/9 [00:57<00:00,  6.41s/it]
Fetching 9 files: 100%|███████████████████████| 9/9 [00:00<00:00, 110054.62it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Fetching 9 files: 100%|████████████████████████| 9/9 [00:00<00:00, 27453.63it/s]
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/queueing.py", line 541, in process_events
    response = await route_utils.call_process_api(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/route_utils.py", line 276, in call_process_api
    output = await app.get_blocks().process_api(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/blocks.py", line 1928, in process_api
    result = await self.call_function(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/blocks.py", line 1526, in call_function
    prediction = await utils.async_iteration(iterator)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 657, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 783, in asyncgen_wrapper
    response = await iterator.__anext__()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/chat_interface.py", line 592, in _stream_fn
    first_response = await async_iteration(generator)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 657, in async_iteration
    return await iterator.__anext__()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 650, in __anext__
    return await anyio.to_thread.run_sync(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/gradio/utils.py", line 633, in run_sync_iterator_async
    return next(iterator)
           ^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/mlx_vlm/chat_ui.py", line 103, in chat
    messages = processor.apply_chat_template(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/stewart/Dropbox/dev/vmlx/rbuild/lib/python3.11/site-packages/transformers/processing_utils.py", line 926, in apply_chat_template
    raise ValueError(
ValueError: No chat template is set for this processor. Please either set the `chat_template` attribute, or provide a chat template as an argument. See https://huggingface.co/docs/transformers/main/en/chat_templating for more information.
^CKeyboard interruption in main thread... closing server.
Blaizzy commented 1 month ago

Hey @stewartugelow

I still haven't managed to replicate your issue.

Here is what I did:

  1. Reinstall mlx-vlm v0.0.11
  2. remove all cached models
  3. download model weights and run them

And still works as expected on two difference machines.

mlx-community/llava-phi-3-mini-4bit

Screenshot 2024-07-15 at 2 10 01 PM

mlx-community/llava-llama-3-8b-v1_1-8bit

Screenshot 2024-07-15 at 1 35 12 PM

mlx-community/llava-1.5-7b-4bit

Screenshot 2024-07-15 at 2 10 01 PM

Blaizzy commented 1 month ago

Pip list | grep mlx

fastmlx                                   0.1.0
mlx                                       0.15.2
mlx-lm                                    0.16.0            /Users/prince_canuma/Documents/Projects/LLMs/mlx-lm/llms
mlx-vlm                                   0.0.11
Blaizzy commented 1 month ago

@stewartugelow could you share the output of:

from mlx_vlm.utils import load

model_path = "mlx-community/llava-phi-3-mini-4bit"
model, processor = load(model_path)
print(processor.__dict__)

and

prompt = processor.tokenizer.apply_chat_template(
    [{"role": "user", "content": f"<image>What are these?"}],
    tokenize=False,
    add_generation_prompt=True,
)
print(prompt)
BoltzmannEntropy commented 1 month ago

I have a similar issue:

python -m mlx_vlm.chat_ui --model mlx-community/Bunny-Llama-3-8B-V-8bit
Fetching 9 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 89030.04it/s]
Fetching 9 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 23801.22it/s]
Traceback (most recent call last):
  File "/path/to/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/path/to/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/path/to/python3.10/site-packages/mlx_vlm/chat_ui.py", line 35, in <module>
    model, processor = load(args.model, {"trust_remote_code": True})
  File "/path/to/python3.10/site-packages/mlx_vlm/utils.py", line 244, in load
    model = load_model(model_path, lazy)
  File "/path/to/python3.10/site-packages/mlx_vlm/utils.py", line 153, in load_model
    text_config = AutoConfig.from_pretrained(config["language_model"])
KeyError: 'language_model'
Blaizzy commented 1 month ago

@BoltzmannEntropy could you share the version of mlx-vlm you aer running?

BoltzmannEntropy commented 1 month ago

Sure:

sol@mprox dev % pip freeze | grep mlx                                            
mlx==0.16.1
mlx-lm==0.16.1
mlx-vlm==0.0.11
Blaizzy commented 1 month ago

@BoltzmannEntropy the problem is fixed. It was a missing key in the config :)

BoltzmannEntropy commented 1 month ago

While this works: huggingface-cli download --local-dir Bunny-Llama-3-8B-V-8bit mlx-community/Bunny-Llama-3-8B-V-8bit The command: python -m mlx_vlm.chat_ui --model mlx-community/Bunny-Llama-3-8B-V-8bit Produces:

huggingface_hub.utils._errors.GatedRepoError: 401 Client Error. (Request ID: Root=1-66abb222-0a5ce8e526d64b151b4fab07;dbf7c596-05f5-4c69-8460-c66147d36261)

Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8B/resolve/main/config.json.
Access to model meta-llama/Meta-Llama-3-8B is restricted. You must be authenticated to access it.

I never had to authenticate before

Blaizzy commented 1 month ago

It's not a bug.

You have to access the model config of a gated model :) Just got to the repo on HF meta-llama/Meta-Llama-3-8B and request access.

BoltzmannEntropy commented 1 month ago
sol@mprox dev % python -m mlx_vlm.chat_ui --model mlx-community/llava-phi-3-mini-4bit  
Fetching 9 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 53242.22it/s]
Fetching 9 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 73298.52it/s]
Fetching 9 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 190650.18it/s]
Traceback (most recent call last):
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/mlx_vlm/chat_ui.py", line 133, in <module>
    demo = gr.ChatInterface(
TypeError: ChatInterface.__init__() got an unexpected keyword argument 'additional_inputs_accordion'
Blaizzy commented 1 month ago

This is a gradio problem. They had many breaking changes recently.

I will fix it on the next release tomorrow and pin the version to avoid such cases.

Blaizzy commented 1 month ago

@stewartugelow I managed to replicate your issue as well. And will address it :

Blaizzy commented 1 month ago

@stewartugelow @BoltzmannEntropy Could you guys update to the latest gradio, install this PR #54 from source and give it a try to see if it fixes your issues?

Blaizzy commented 1 month ago

Update gradio:

pip install -U gradio

BoltzmannEntropy commented 1 month ago
dev % python -m mlx_vlm.chat_ui --model mlx-community/llava-phi-3-mini-4bit
Fetching 9 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 136770.78it/s]
Fetching 9 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 85598.04it/s]
Fetching 9 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 15051.33it/s]
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/queueing.py", line 536, in process_events
    response = await route_utils.call_process_api(
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/route_utils.py", line 285, in call_process_api
    output = await app.get_blocks().process_api(
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/blocks.py", line 1923, in process_api
    result = await self.call_function(
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/blocks.py", line 1520, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/utils.py", line 663, in async_iteration
    return await iterator.__anext__()
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/utils.py", line 768, in asyncgen_wrapper
    response = await iterator.__anext__()
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/chat_interface.py", line 652, in _stream_fn
    first_response = await async_iteration(generator)
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/utils.py", line 663, in async_iteration
    return await iterator.__anext__()
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/utils.py", line 656, in __anext__
    return await anyio.to_thread.run_sync(
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 807, in run
    result = context.run(func, *args)
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/gradio/utils.py", line 639, in run_sync_iterator_async
    return next(iterator)
  File "/Users/sol/.pyenv/versions/3.10.8/lib/python3.10/site-packages/mlx_vlm/chat_ui.py", line 96, in chat
    if len(message["files"]) >= 1:
TypeError: 'MultimodalData' object is not subscriptable
Blaizzy commented 1 month ago

You didn't install from source.

To install from source, first clone the branch then run

pip install -e .