Closed mobile-appz closed 1 month ago
Thanks!
The issue you are facing is because Paligemma doesn't use any chat template or manual image token
You can just pass the text directly.
prompt = "what are these?"
I will add examples for all models soon. Or if you want you can make a PR :)
Thanks!
The issue you are facing is because Paligemma doesn't use any chat template or manual image token
You can just pass the text directly.
prompt = "what are these?"
I will add examples for all models soon. Or if you want you can make a PR :)
Thank you very much for your help and quick response. I can confirm that this works now perfectly.
Most welcome! 🤗
mlx-vlm Version: 0.0.7 mlx Version: 0.14.0
Great work with this, it's working well apart from when using with PaliGemma in the supplied inference Python script. I'm experiencing an error when running the script found in the readme file, using the paligemma-3b-mix-448-8bit model as per code below:
The "CLI" and "Chat UI with Gradio" inference steps in the readme are working correctly, with the model set as "mlx-community/paligemma-3b-mix-448-8bit". I'm using Conda and MLX and MLX-VLM has been installed using PIP.
The error is as follows:
NumPy boolean array indexing assignment cannot assign 2097152 input values to the 2099200 output values where the mask is true File "/opt/anaconda3/envs/mlx/lib/python3.11/site-packages/mlx_vlm/models/paligemma/paligemma.py", line 115, in _prepare_inputs_for_multimodal final_embedding[image_mask_expanded] = scaled_image_features.flatten()