SBERT for images ? - Githubissues

UKPLab / sentence-transformers

Multilingual Sentence & Image Embeddings with BERT

https://www.SBERT.net

Apache License 2.0

14.35k stars 2.39k forks source link

SBERT for images ? #154

Open ghost opened 4 years ago

ghost commented 4 years ago

Hi,

Thanks for the great paper and library. Just curious to know if it can also be used to check similarity between images if image input features are given to the model. It may be a naive question, but want to know if such NLP algorithm can be applied to Vision tasks as well. It will be great to know if it can work, then how; and if not then why.

Thanks!

nreimers commented 4 years ago

Hi @sidSingla The issue will be to get the image into the network. BERT and the other available options are designed to work with text, i.e., they take text / words as input and compute the vectors.

Don't see how you would be able to input an image into that. To be able to do that, you would need to create a new model that takes an image as input and outputs a fixed sized vector.

The general idea with siamese networks also works for finding similar images. It is quite extensively used in computer vision and the idea originates from there. You find quite a lot of papers on that.

But sadly I don't know if there are any available models / frameworks to compute image similarities.

Best Nils

ghost commented 4 years ago

Hi @nreimers

Thanks for your reply. Yes, given the popularity of siamese networks and triplet loss in vision, I wondered if the sentence bert model can be used for images. For input in the network, the penultimate layer of a CNN architecture can be used and then fine-tuned for Sentence Bert. I think I will try exploring it in future time. :)

Thanks Sid

nreimers commented 4 years ago

HuggingFace transformers repository also have the MMBT model implemented, that maps text and images: https://github.com/facebookresearch/mmbt/

Maybe that could be a starting point.

paulomann commented 4 years ago

I have used this project https://github.com/almazan/deep-image-retrieval and it works quite well even with completely different images. The code is well written and you can easily set up a working example.

zhanxlin commented 1 month ago

@tomaarsen hi, does Sbert support training image embedding?