Finetuning Image Text Vectorizer with CLIP

backprop-ai / backprop

Backprop makes it simple to use, finetune, and deploy state-of-the-art ML models.

Other

243 stars 12 forks source link

Hello, I tried finetuning Image-Text Vectorizer CLIP model using above approach. But I get stuck with the error -

Link to full code - Colab

What I need is something which gives cosine similarity between an image and a text, shall I finetune with triplet, or with cosine similarity? if its cosine similarity, then how will I get those cosine similarity?

The triplet variant takes text and image and gives one normalised vector, I am bit confused because I thought it would give a cosine similarity.

backprop-ai / backprop

Finetuning Image Text Vectorizer with CLIP #21