Closed wingz1 closed 5 years ago
I don't have any particular script for that purpose. But you can look at the function encode_data
to get an idea:
https://github.com/fartashf/vsepp/blob/226688a0f26aa1c32d34fbe723795dc65702504c/evaluation.py#L73
encode_data
gets the input from data_loader
and encodes all images and captions given by that loader. It's probably easiest to write a special data loader class that handles your data. For that, take a look at data.py
.
@wingz1 were you able to do it, any snippets or tips?
I am having a similar task in hand, I want to utilize COCO captions to extract top k images.
Yes, actually. I added a "def caption2emb( model, mycaption, vocab )" function to evaluate.py
@wingz1 Oh that's great to know, is it available anywhere to have a look? I would help me a lot.
This code works quite well. Thanks for sharing it. I'm wondering, do you have any code snippets to show how one might use a trained VSE++ model to create their own caption query from text (i.e. a string), submit it to the VSE++ model to get a single caption embedding and then search for matching images that have also mapped to the joint space using the same model? It's easy to do the comparison once a numpy array for the caption and image embeddings in joint space are created, but it's not clear how to use your model with a brand new caption query or simply a set of CNN image features that are not part of some complete COCO/FLICKR/etc train or test set with corresponding caption/image pairs. Thanks for any tips. I'd prefer not to rewrite everything if you already have some additional tools for this.