-
The multimodal is not active. The logs actually show than when adding a file or an image to the prompt, the prompt (incl. the file) is not even passed through the APIs.
-
Hello, I am very interested in your great work. I see in the code that the sequence of the image generation input is basically text tokens before image tokens, what about reversing the order when gene…
-
I am a student from China, and I really appreciate your project. I am now trying to do some interesting work, but I have encountered some problems. My idea is to perform topic modeling using product i…
-
Like what I said, does it support the title? does it multimodal-in, multimodal-out(with multi images)?
-
### 🚀 The feature, motivation and pitch
In flava multimodal encoder, why don't we pass an attention mask to mask out '[PAD]' embeddings coming from text encoder? Is this a bug or intentional?
https…
-
Hi it looks like you updated the API a little bit in this commit
https://github.com/guinmoon/llmfarm_core.swift/commit/e4e8aa7617e2e86af434677cc4196462a0005ea9
Would you mind giving an updated w…
-
see https://www.llamaindex.ai/blog/multimodal-rag-in-llamacloud
-
For instance, it would be nice if you could `chap ask --attach moon.jpg "What is in this photo"`.
-
This issue is an overview of tasks to add for a massive multimodal extension of MTEB. The modalities are:
- T=Text
- I=Image
- A=Audio
- V=Video without audio i.e. just multiple images
Below is…
-
**What problem or use case are you trying to solve?**
https://www.swebench.com/multimodal.html
**Describe the UX of the solution you'd like**
**Do you have thoughts on the technical implement…