llava-cli: improve llava-cli and the API for using LLaVA

ggerganov / llama.cpp

LLM inference in C/C++

MIT License

61.79k stars 8.85k forks source link

llava-cli: improve llava-cli and the API for using LLaVA #6027

Open phymbert opened 4 months ago

phymbert commented 4 months ago

From:

https://github.com/ggerganov/llama.cpp/issues/4216#issuecomment-1991730224

cleaning up the clip/llava libs and improving the API
in the old implementation, there were many internal object exposed to the server and the memory management was dubious
there was no obvious path for supporting parallel multimodal slots

phymbert commented 4 months ago

@ggerganov please tell me how I can help on this

phymbert commented 4 months ago

ping @damian0815 as you originally started llava.h

JoanFM commented 1 month ago

Hello,

Is there any progress in here? I wonder if I could be of any help.

I think it would be nice to make multimodality much more of a first class citizen in llama.cpp. I would be interested on supporting jina-clip-v1 model after the refactoring.

ngxson commented 4 weeks ago

I'm recently playing around with the currently llava implementation.

Currently, a clip model has its own clip_model_load which does not use mmap. While clip_image_batch_encode exists that could be used to process parallel slots, it's not used by llava.cpp. One of the idea that I have in my mind is to somehow reuse llama_load_model_from_file to load the model and llama_decode to decode batch of patches/images.

But that's only very draft idea, probably too complicated to implement atm. @ggerganov what do you think about this?