fartashf / vsepp

PyTorch Code for the paper "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives"
Apache License 2.0
489 stars 125 forks source link

single caption query #17

Closed wingz1 closed 5 years ago

wingz1 commented 5 years ago

This code works quite well. Thanks for sharing it. I'm wondering, do you have any code snippets to show how one might use a trained VSE++ model to create their own caption query from text (i.e. a string), submit it to the VSE++ model to get a single caption embedding and then search for matching images that have also mapped to the joint space using the same model? It's easy to do the comparison once a numpy array for the caption and image embeddings in joint space are created, but it's not clear how to use your model with a brand new caption query or simply a set of CNN image features that are not part of some complete COCO/FLICKR/etc train or test set with corresponding caption/image pairs. Thanks for any tips. I'd prefer not to rewrite everything if you already have some additional tools for this.

fartashf commented 5 years ago

I don't have any particular script for that purpose. But you can look at the function encode_data to get an idea: https://github.com/fartashf/vsepp/blob/226688a0f26aa1c32d34fbe723795dc65702504c/evaluation.py#L73

encode_data gets the input from data_loader and encodes all images and captions given by that loader. It's probably easiest to write a special data loader class that handles your data. For that, take a look at data.py.

prraoo commented 4 years ago

@wingz1 were you able to do it, any snippets or tips?

I am having a similar task in hand, I want to utilize COCO captions to extract top k images.

wingz1 commented 4 years ago

Yes, actually. I added a "def caption2emb( model, mycaption, vocab )" function to evaluate.py

prraoo commented 4 years ago

@wingz1 Oh that's great to know, is it available anywhere to have a look? I would help me a lot.