-
### Motivation
thanks for your excellent work.
I have found that in the internvl-g, we can find the retrieval code, which can be found also in clip benchmark.
I wonder that How can we utilize …
-
A notebook that demonstrates how to use a multimodal RAG that combines two types of inputs, such as text and images, to retrieve relevant information from a dataset and generate new outputs based on t…
-
Hope to access the 2D VQA and Image-Text Retrieval Task
-
Greate job. Can you provide an example on how to implement Text-image Retrieval?
-
Image-to-text Image captioning model generates conditional and unconditional captions based on image uploaded by user
Image captioning is the task of describing the content of the image using textual…
-
Does VLDet support image and text retrieval? For example, my purpose is to give a text to retrieve the most matching image. If the model supports it, should I use the image embedding? Or each instanc…
-
I would like to quote your method of Qualitative analysis, method. image and text retrieval your work is very meaningful, however this piece of code did not find convenient open source? Thank you very…
-
I met the same error as https://github.com/rom1504/clip-retrieval/issues/345 when I used `clip-retrieval inference` command to extract images and corresponding texts features, my command is like foll…
-
I've been working with your model for image text retrieval, and I'm encountering some challenges in replicating the results in Table 7 of your paper.
I've tried using image embeddings (using RAM++…
-
Thank you for your elegant work! I am wondering if InternV2 has the same function like InternVL-C in the previous versions that support cross-modal feature retrieval, or how I can get aligned embeddin…