dvlab-research / MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Apache License 2.0
3.22k stars 280 forks source link

Feature Request: llama.cpp support #52

Closed deutschthomas closed 7 months ago

deutschthomas commented 7 months ago

I have an interest in this project, but my system VRAM is small so I prefer to use the llama.cpp-based toolchain(ollama etc.) and GGUF quantization. However, the dual encoder architecture of the model may cause a necessary change of the existing LLaVA workflow to be used with this model and the support of multimodal Gemma isn't implemented yet. Do you have any idea? This will benefit the devices with small (v)RAMs to use this model.

yanwei-li commented 7 months ago

Thanks for your suggestion! We do not have enough bandwidth now. But we will try to implement it. Thanks.