-
# URL
- https://arxiv.org/abs/2411.02571
# Authors
- Sheng-Chieh Lin
- Chankyu Lee
- Mohammad Shoeybi
- Jimmy Lin
- Bryan Catanzaro
- Wei Ping
# Abstract
- State-of-the-art retrieval mod…
-
### Your current environment
```text
The output of `python collect_env.py`
```
### How would you like to use vllm
I want to run inference of [ColPali](https://huggingface.co/vidore/colpali). I …
-
**Is your feature request related to a problem? Please describe.**
- obsidian in many usecases can contain a lot of non-text message, unstructured data such as
1. imgs
2. pdfs
3. pdfs with c…
-
I am currently planning to prepend an image to the query section, meaning the query will consist of an image along with a question about it. The system will then search the provided documents to find …
-
### Proposal summary
## Feature Request
Enable Opik to display additional media formats, including audio, PDF, and video.
## Background
Opik currently supports only image display, which li…
-
A notebook that demonstrates how to use a multimodal RAG that combines two types of inputs, such as text and images, to retrieve relevant information from a dataset and generate new outputs based on t…
-
**Is your feature request related to a problem? Please describe.**
I'm frustrated when I can't use multimodal models like "gpt-4-vision-preview" in Cheshire-cat-ai to process and retrieve information…
-
Submitting Author: Tharsis Souza (@souzatharsis)
Package Name: podcastfy
One-Line Description of Package: Transforming Multimodal Content into Captivating Multilingual Audio Conversations with Gen…
-
Is there any versions for the model of **Visualized BGE based on BAAI/bge-base-zh-v1.5**?And how does the BAAI/bge-visualized-m3 performance compared with ChineseCLIP?
-
see https://www.llamaindex.ai/blog/multimodal-rag-in-llamacloud