Models to port to MLX-VLM

Blaizzy commented 5 months ago

[x] MiniCPM-Llama3-V-2_5
[x] Florence 2
[x] Phi-3-vision
[x] Bunny
[x] Dolphi-vision-72b
[x] Llava Next
[x] Qwen2-VL
[x] Pixtral
[x] Llama-3.2
[x] Llava Interleave
[x] Idefics 3
[ ] OmniParser
[ ] Llava onevision
[ ] internlm-xcomposer2d5-7b
[ ] InternVL
[ ] CogVLM2
[ ] Copali
[ ] MoonDream2
[ ] Yi-VL
[ ] CuMo
[ ] Kosmos-2.5
[x] Molmo
[ ] Ovis Gemma
[ ] Aria
[ ] NVIDIA NVLM
[ ] GOT

Instructions:

Select the model and comment below with your selection
Create a Draft PR titled: "Add support for X"
Read Contribution guide
Check existing models
Tag @Blaizzy for code reviews and questions.

If the model you want is not listed, please suggest it and I will add it.

Blaizzy commented 5 months ago

Next release of Llava-Next

TODO: update text config defaults to avoid errors with Llava-v1.6-vicuna:

class TextConfig:
    model_type: str
    hidden_size: int = 4096
    num_hidden_layers: int = 32
    intermediate_size: int = 11008
    num_attention_heads: int = 32
    rms_norm_eps: float = 1e-05
    vocab_size: int = 32064
    num_key_value_heads: int = 32
    rope_theta: float = 1000000
    rope_traditional: bool = False
    rope_scaling: Optional[Dict[str, Union[float, str]]] = None

BoltzmannEntropy commented 3 months ago

Thanks for the great repo. This should also be on the list: https://github.com/THUDM/CogVLM2 I am now just reading the code, and trying to free some time for the conversion routine.

jrp2014 commented 3 months ago

https://llava-vl.github.io/blog/2024-08-05-llava-onevision/

Blaizzy commented 3 months ago

Hey @BoltzmannEntropy and @jrp2014,

Thanks for the suggestions!

I have added them to the backlog

jrp2014 commented 3 months ago

MiniCPM-V v2.6

jrp2014 commented 3 months ago

MiniCPM-V v2.6

s-smits commented 2 months ago

Do you have a link to Florence-2?

ChristianWeyer commented 2 months ago

Is the above list the ultimate and up-to-date list of supported models @Blaizzy? Thanks for your hard work!

Blaizzy commented 2 months ago

Hey @ChristianWeyer Its mostly up-to-date, just missing qwen2-vl

Blaizzy commented 2 months ago

@s-smits here you go:

https://huggingface.co/microsoft/Florence-2-large/blob/main/modeling_florence2.py

ChristianWeyer commented 2 months ago

[x] Phi-3-vision

Thanks! I guess Phi-3-vision includes 3.5?

Blaizzy commented 2 months ago

Yes, they have the same arch so there are no changes needed :)

pulkitjindal88 commented 2 months ago

Hey @Blaizzy, thanks for this great framework. Is there any priority for InternVL? I can see it is present in your list. Just wanted to know if it planned in your near term. Want to make the model run on my macbook and mlx-vlm looks to be the best way for that.

chigkim commented 2 months ago

Qwen2-VL-72B would be amazing!

simonw commented 1 month ago

This recipe seems to work for Qwen2-VL-2B-Instruct:

python -m mlx_vlm.generate \
  --model Qwen/Qwen2-VL-2B-Instruct \
  --max-tokens 100 \
  --temp 0.0 \
  --image django-roadmap.png \
  --prompt "Describe image in detail, include all text"

My results here: https://gist.github.com/simonw/9e02d425cacb902260ec1307e0671e17

chigkim commented 1 month ago

Yep they just merged Qwen2-vl support this weekend.

xSNYPSx commented 1 month ago

Molmo please

chigkim commented 1 month ago

Nvidia just dropped multimodal NVLM-D-72B. The benchmark looks pretty good.

https://huggingface.co/nvidia/NVLM-D-72B

Blaizzy commented 1 month ago

Yap, that's a pretty awesome model! It's on my radar because we can run it in 4bit quant

chigkim commented 1 month ago

Pixtral-12B now has Base model. https://huggingface.co/mistralai/Pixtral-12B-Base-2409

Benjoyo commented 4 days ago

Hey @Blaizzy, could you add ColQwen support? As there already is qwen2-vl and ColQwen is just an additional linear layer on top this seems to be a low hanging fruit, also considering Col* is a really hot topic right now.

I could really use this for my projects (e.g. local private document search + qa) 😊

pcuenca commented 1 day ago

Working on Idefics 3 here: https://github.com/Blaizzy/mlx-vlm/pull/124

Blaizzy commented 1 day ago

@Benjoyo, ColQwen and CoPali are awesome models.

At the moment, I'm going working on refactoring and some optimisations. New model ports by me are on hold.

However, I appreaciate any PRs. I'm here to review and help when needed.

Blaizzy commented 1 day ago

Thanky you very much, @pcuenca!

It means a lot 🚀

I left a few comments.

Blaizzy / mlx-vlm

Models to port to MLX-VLM #39