ggerganov / llama.cpp

LLM inference in C/C++
MIT License
67.43k stars 9.68k forks source link

llama : add multimodal support (LLaVA) #3332

Closed aiaicode closed 1 year ago

aiaicode commented 1 year ago

Now that OpenAI is adding voice and image to ChatGPT and will probably be the new norm, wouldn't it be a good idea for llama.cpp to also please add this to the roadmap? if possible?

jagtesh commented 1 year ago

It would depend on having access to high quality multi-modal models. I don't know if one exists yet, in the same league as llama.

aiaicode commented 1 year ago

Hopefully Llama3 would be that.

monatis commented 1 year ago

Yesterday LLaVA-RLHF was announced. It's the first open-source RLHF-trained multimodal model. And we previously had Idefics from HF. After introducing GGUF support in clip.cpp, now it's possible to implement multimodal inference by combining it with llama.cpp. Architecturally LLaVA is much simpler than Idefics, but if Idefics' performance is considerably better than LLaVA-RLHF, I can start with it as well. WDYT?

ggerganov commented 1 year ago

We should make a PoC (either as a separate repo or as an example in this repo) to implement LLaVA

monatis commented 1 year ago

I started to work on LLaVA in another repo but it's extremely difficult to manage llama.cpp and clip.cpp together while depending two different versions of ggml, so it would be much easier for me if it's ok to implement it in this repo.

Green-Sky commented 1 year ago

pr: https://github.com/ggerganov/llama.cpp/pull/3436

aiaicode commented 1 year ago

Thank you @monatis ! You legend.

ChrisW-priv commented 9 months ago

Hi, do I understand correctly that the multimodal support is now added? how to run such a model using a cli? say I have a photo to analise and downloaded the zhiqings/LLaVA-RLHF-7b-v1.5-224 model from hugging face.

I am really new to the field, recently compiled llama.cpp locally, played aroud with it, can you point me to some materials/tutorials?

PS. when I saw the project I was quickly overwelmed. I could work on documentation of how to use it but I am soo new. do contibutors meet to discuss the developement or something ??

svenstaro commented 9 months ago

@ChrisW-priv Not sure this is still relevant to you but this is actually documented in the original MR:

./bin/llava -m ggml-model-q5_k.gguf --mmproj mmproj-model-f16.gguf --image path/to/an/image.jpg