dvlab-research / MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Apache License 2.0
3.18k stars 279 forks source link

Congratulations for the best LLaVA derived models ! #104

Open deepbeepmeep opened 4 months ago

deepbeepmeep commented 4 months ago

I have been playing with most multimodal models based on LLaVA models and I can tell that mini Gemini (the 13B version) is one of the best if not the best for its size.

Keep on the good work and hopefully you can go even further using Llama 3 or Phi-3 as a base model.

yanwei-li commented 4 months ago

Hi, thanks for your response and suggestions! We released the LLaMA3-based models. You are welcome to try the MGM-8B and MGM-8B-HD.