AnswerDotAI / byaldi

Use late-interaction multi-modal models such as ColPali in just a few lines of code.
Apache License 2.0
626 stars 60 forks source link

Support for searching with query image #27

Open nuschandra opened 2 months ago

nuschandra commented 2 months ago

Hi @bclavie & team,

I currently don't see support for searching through an index with a query image instead of a text query. I understand that there is an encode_image option but that only provides the embeddings of the query image and not a full search through the indexed documents along with maxsim calculations. It would be really nice to have support for querying with an image too.

nuschandra commented 1 month ago

@bclavie If you think that this would be a useful feature, I'd be happy to contribute and raise a PR for the same.

bclavie commented 1 month ago

Hey! It'd actually be a completely experimental feature since it's not even done in the paper, but I'd be happy to include it under a beta flag if you would like to contribute it!

nuschandra commented 1 month ago

@bclavie Thanks for your response! Sure, yes I understand. In terms of the logic it remains the same i.e. process_images would return pixel_values & input_ids for the prompt (just like when we do indexing). If we are searching by image, we just make the forward call with both pixel_values and input_ids and get the embeddings which can later be used for maxsim calculations. I will make the code changes later this week and raise a PR .

sergenerbay commented 4 weeks ago

Hi, I need the same feature. Have you completed the code?