-
I've read the docs and rigging doesn't seem to currently support multimodal LLMs, do you have any plan for that? It'd be interesting to test llava-phi3 or llava-llama3 with text + image input (i'm thi…
-
Thank you for the great model.
I wonder how can I get the multimodat embedding of different inputs like image and its caption usign Imagebind?
if I can get that then how can it be compared to CL…
-
Hi, I was trying this model here:
https://huggingface.co/MoMonir/llava-llama-3-8b-v1_1-GGUF
It also comes with some instructions on how to use it for images. Is this also possible somehow with Jla…
-
I tried to run multimodal inference following this demo code, but the model keeps responding with excuses such as:
I'm unable to meet that request.
I must politely decline that, sorry.
I'm sorry,…
-
**Is your feature request related to a problem? Please describe.**
A real assistant would not only converse by text but can speak and use video / images.
**Describe the solution you'd like**
…
-
-
Nice work!
Did you already tried some tests with multimodal data? Such as depth images and thermal?
-
Hey guys love what you're doing here. Just wondering if Llava or multimodal is in the works? Thanks
-
I came across a model on Huggingface that supports Llama3 multimodal [Bunny-Llama-3-8B-V: bunny-llama](https://huggingface.co/BAAI/Bunny-Llama-3-8B-V), and I'd like to be able to deploy it using lla…
xx025 updated
2 months ago
-
> [!TIP]
> ## Want to get involved?
> We'd love it if you did! Please get in contact with the people assigned to this issue, or leave a comment. See general contributing advice [here](https://micros…