I have an interest in this project, but my system VRAM is small so I prefer to use the llama.cpp-based toolchain(ollama etc.) and GGUF quantization. However, the dual encoder architecture of the model may cause a necessary change of the existing LLaVA workflow to be used with this model and the support of multimodal Gemma isn't implemented yet. Do you have any idea? This will benefit the devices with small (v)RAMs to use this model.
I have an interest in this project, but my system VRAM is small so I prefer to use the llama.cpp-based toolchain(ollama etc.) and GGUF quantization. However, the dual encoder architecture of the model may cause a necessary change of the existing LLaVA workflow to be used with this model and the support of multimodal Gemma isn't implemented yet. Do you have any idea? This will benefit the devices with small (v)RAMs to use this model.