Open nuschandra opened 2 months ago
@bclavie If you think that this would be a useful feature, I'd be happy to contribute and raise a PR for the same.
Hey! It'd actually be a completely experimental feature since it's not even done in the paper, but I'd be happy to include it under a beta flag if you would like to contribute it!
@bclavie Thanks for your response! Sure, yes I understand. In terms of the logic it remains the same i.e. process_images would return pixel_values & input_ids for the prompt (just like when we do indexing). If we are searching by image, we just make the forward call with both pixel_values and input_ids and get the embeddings which can later be used for maxsim calculations. I will make the code changes later this week and raise a PR .
Hi, I need the same feature. Have you completed the code?
Hi @bclavie & team,
I currently don't see support for searching through an index with a query image instead of a text query. I understand that there is an encode_image option but that only provides the embeddings of the query image and not a full search through the indexed documents along with maxsim calculations. It would be really nice to have support for querying with an image too.