-
### Motivation
thanks for your excellent work.
I have found that in the internvl-g, we can find the retrieval code, which can be found also in clip benchmark.
I wonder that How can we utilize …
-
# Bug Report
## Installation Method
Helm Chart
## Environment
- **Open WebUI Version:** v0.4.1
- **Operating System:** Ubuntu 22.04
**Confirmation:**
- [x] I have read and followed …
-
A notebook that demonstrates how to use a multimodal RAG that combines two types of inputs, such as text and images, to retrieve relevant information from a dataset and generate new outputs based on t…
-
Greate job. Can you provide an example on how to implement Text-image Retrieval?
-
Hope to access the 2D VQA and Image-Text Retrieval Task
-
Hi,
I find that it's nice to have a few benchmark datasets integrated into libraries for easier research. My feature request boils down to the implementation of a few image retrieval datasets, name…
-
### What problem does the new feature solve?
jina-clip-v1 is the best multi-modal embedding model now.
### What does the feature do?
It can be used to build better image retrieval application.
###…
-
I would like to quote your method of Qualitative analysis, method. image and text retrieval your work is very meaningful, however this piece of code did not find convenient open source? Thank you very…
-
Does VLDet support image and text retrieval? For example, my purpose is to give a text to retrieve the most matching image. If the model supports it, should I use the image embedding? Or each instanc…
-
I've been working with your model for image text retrieval, and I'm encountering some challenges in replicating the results in Table 7 of your paper.
I've tried using image embeddings (using RAM++…