Closed JosefAlbers closed 4 months ago
Hey @JosefAlbers
Thank you!
Awesome, that model is on the roadmap after Paligemma #24.
Please feel free to submit a PR to support it :)
@JosefAlbers
Paligemma is done, thanks!
Do you want to take on Phi-3-vision?
Yes, I'd love to! Just a heads-up, I'm new to mlx, so I might need a little guidance along the way.
No problem, I'm here to help :)
@JosefAlbers
Paligemma is done, thanks!
Do you want to take on Phi-3-vision?
Is there a list of officially supported models?
@ChristianWeyer not yet.
But at the moment we support the following archictetures:
There are still many more to add.
@ChristianWeyer not yet.
But at the moment we support the following archictetures:
- Llava (Clip + Llama)
- Paligemma (Siglip + Gemma)
- Idefics2 (Siglip + Mistral)
- NanoLlava (Siglip + Qwen2)
Which high quality Llava model can we use - any recommendations (from HF)?
Thx. These are not good enough for our use cases ;-).
Could you please open a new issue and explain your use case?
@Blaizzy, I have a working demo of Phi-3-vision support for MLX: https://github.com/JosefAlbers/Phi-3-Vision-MLX
It handles texts and image inputs, generating expected outputs. With the new Su-scaled RoPE, it seems to work reasonably well even with extremely long contexts.
Just a heads-up for now. I'll circle back when it's more polished and ready for feedback.
I love the speed!
Awesome, looking forward to the polished version :)
@Blaizzy Thanks so much, I've learned a ton about MLX and VLMs by studying the well written and documented codes in your repo. I'll keep you posted on my progress and will definitely reach out when I have a more polished version ready for your feedback!
Most welcome!
I'm happy I could be of help,
Let me know when you ready.
You guys are heroes!
@Blaizzy, I'd really appreciate it! I'm just about to start working on a PR for adding su-RoPE support to mlx-lm
. Once that is merged, I think I can craft a version of the phi-3-vision that can fit seamlessly into the mlx-vlm
framework.
In the meantime, I've been experimenting the model with various inputs and LLM/VLM techniques in my own repo, and am really amazed by how well it handles both text and image prompts. I'm excited to get your feedback!
@lin72h, thanks a lot!
Most welcome, it's my pleasure!
I'm just about to start working on a PR for adding su-RoPE support to mlx-lm. Once that is merged,
@JosefAlbers Why do the round trip? When we can have it here.
Note: mlx-lm
is only for language models, thus the lm
. Unless there are other language models that use su-RoPE it's not going to be merged.
@JosefAlbers Why do the round trip? When we can have it here.
Note:
mlx-lm
is only for language models, thus thelm
. Unless there are other language models that use su-RoPE it's not going to be merged.
@Blaizzy Right, I will see if I can port the phi3_v into the mlx_vlm today.
Hi, I've been exploring this repo for the past couple of days and I find your work here really amazing. I'm curious if there are any plans to add support for the Phi-3-vision-128k-instruct model to this library? I'd be happy to contribute in any way I can to help make this happen.