Can this model achieve retrieval from text to (image + text)

LinWeizheDragon / FLMR

The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.

70 stars 4 forks source link

Can this model achieve retrieval from text to (image + text) #24

Closed Annie1900 closed 4 months ago

Annie1900 commented 4 months ago

Can this model achieve retrieval from text to (image + text)? For example, I have a query (text) and a database that contains images and their corresponding descriptions. I want to retrieve the fused features of visual embeddings and text embeddings for each image in the database. If possible, how should I implement this?Thank you very much!

Annie1900 commented 4 months ago

I noticed that the appendix of the paper mentioned 'Retrieving Multi-modal Documents with FLMR,' but I'm not sure how to use the related code. Could you please provide some guidance? Thank you very much!

LinWeizheDragon commented 4 months ago

Hi Please see the README file. We already implemented this:

# Option 3. multi-modal documents with images
# random_images = torch.randn(num_items, 3, 224, 224)
# to_img = ToPILImage()
# if not os.path.exists("./test_images"):
#     os.makedirs("./test_images")
# for i, image in enumerate(random_images):
#     image = to_img(image)
#     image.save(os.path.join("./test_images", "{}.jpg".format(i)))

# image_paths = [os.path.join("./test_images", "{}.jpg".format(i)) for i in range(num_items)]

# custom_collection = [
#     (passage_content, None, image_path)
#     for passage_content, image_path in zip(passage_contents, image_paths)
# ]

LinWeizheDragon commented 4 months ago

But note that due to the fact that data of image+text -> image+text is quite sparse, we did not pre-train the PreFLMR models on image+text -> image+text retrieval. Therefore, the performance may be suboptimal before you fine-tune the model on your own text -> image+text task.

Annie1900 commented 4 months ago

嗨，请参阅自述文件。我们已经实现了这一点：

# Option 3. multi-modal documents with images
# random_images = torch.randn(num_items, 3, 224, 224)
# to_img = ToPILImage()
# if not os.path.exists("./test_images"):
#     os.makedirs("./test_images")
# for i, image in enumerate(random_images):
#     image = to_img(image)
#     image.save(os.path.join("./test_images", "{}.jpg".format(i)))

# image_paths = [os.path.join("./test_images", "{}.jpg".format(i)) for i in range(num_items)]

# custom_collection = [
#     (passage_content, None, image_path)
#     for passage_content, image_path in zip(passage_contents, image_paths)
# ]

This is great, thank you very much. I'll go try it right away.