Open phymbert opened 4 months ago
@ggerganov please tell me how I can help on this
ping @damian0815 as you originally started llava.h
Hello,
Is there any progress in here? I wonder if I could be of any help.
I think it would be nice to make multimodality much more of a first class citizen in llama.cpp. I would be interested on supporting jina-clip-v1
model after the refactoring.
I'm recently playing around with the currently llava implementation.
Currently, a clip model has its own clip_model_load
which does not use mmap. While clip_image_batch_encode
exists that could be used to process parallel slots, it's not used by llava.cpp
. One of the idea that I have in my mind is to somehow reuse llama_load_model_from_file
to load the model and llama_decode
to decode batch of patches/images.
But that's only very draft idea, probably too complicated to implement atm. @ggerganov what do you think about this?
From: