-
Hi, I am using llm as part of a multimodal model, so the model needs to pass `input embedding tensor` directly to generate, and also need to access the language model's `embed_tokens` member to fist c…
-
## Executive Summary
The **Agentic Web Platform** is an advanced AI-driven dashboard that empowers users to customize, manage, and optimize their AI agents with unparalleled precision and control.…
-
Here is the development roadmap for 2024 Q4. Contributions and feedback are welcome ([**Join Bi-weekly Development Meeting**](https://t.co/4BFjCLnVHq)). Previous 2024 Q3 roadmap can be found in #634.
…
-
TheThere is a significant performance issue when loading documents: the process takes an unusually long time, and afterward, no documents are found. To test the system, I tagged several documents with…
-
- [ ] [Vespa 🤝 ColPali: Efficient Document Retrieval with Vision Language Models — pyvespa documentation](https://pyvespa.readthedocs.io/en/latest/examples/colpali-document-retrieval-vision-language-m…
-
Hi,
Thanks for your efforts on such a valuable collection!
Could you please add the paper "Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate"?
M…
-
## Value Statement
As someone who wants a boring way to use AI
I would like to expose an image/PDF/document to the LLM
So that I can make requests and extract information, all within Ramalama
…
-
Hi inikisheve,
I use spsa random noise (RDSA) to add little noise into loss function to perturb image as adversarial image to vision-language models. Here I use instrutblip model. However, it does …
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
### Describe the bug
Hi folks, thanks for t…
ghost updated
5 months ago
-
Hello! 💗 When trying to run benchmarks on vision language models (image-text-to-text) I realized this library doesn't support this task. It would be nice to have a support for it since these models ar…