Downstream inference using Faiss

Hello Team,

Thank you for your amazing work on this model. I was able to reproduce your remarkable results. I am looking to contribute and develop downstream inference using faiss but I am running to a lot of issues. The cosine similarity gives incorrect results.

08/03/2024 22:03:10 - INFO - main - Setup model... 08/03/2024 22:03:11 - INFO - main - Using CLIP pretrained weights... 08/03/2024 22:03:17 - INFO - main - Setup model done! Loaded existing embeddings. 08/03/2024 22:03:17 - INFO - main - Loading metadata... 08/03/2024 22:03:17 - INFO - main - Metadata loaded Top 5 results for 'a woman eating':

Distance: 3.3629, Index: 126 Caption: 3d animation music video song Path: video7136.mp4
Distance: 3.0853, Index: 759 Caption: there are some people flying in a helicopter Path: video7769.mp4
Distance: 3.0298, Index: 769 Caption: two men examine a red lamborghini with no tires Path: video7779.mp4
Distance: 3.0025, Index: 176 Caption: a man hugs another man in outer space Path: video7186.mp4
Distance: 2.9686, Index: 55 Caption: a band performs Path: video7065.mp4

I use faiss.IndexFlatIP which is the inner product. How do I make better predictions on the MSRVTT dataset?

SamsungLabs / AdaCLIP

Downstream inference using Faiss #1