Closed vinayak618 closed 6 years ago
Hey sorry for the delayed reply.
The RNN/LSTM captioning models are trained on last layer CNN features extracted from the COCO images, these features are the dataset given to us not the raw images.
To test your own images you would have to use the same CNN model they used to extract features from your test image. They use a VGG16 model pretrained on imagenet so it should be doable to extract but will take time to learn how to do if you aren't familiar with this sort of thing.
On top of that they have reduced the dimension further from 4096 to 512 using PCA (check the section under Microsoft COCO in RNN_captioning notebook). Probably best to just retrain using the full 4096 features ( but then you would have to change things in the assignment to match this new input size).
If you can extract the feature vector from your image then checking the caption is the simple bit and the code below shows how to do it, just place it in a cell right at the end of the notebook.
# Load an image
test_im = plt.imread('./kitten.jpg')
plt.imshow(test_im)
plt.show()
# Feature extraction of the image
# TODO
# Im just using a placeholder input to show the code below will work.
test_input = np.ones([1,512])
# Forward pass of the model.
cap_sample = small_rnn_model.sample(test_input)
cap_sample = decode_captions(cap_sample, data['idx_to_word'])
print(cap_sample)
how do i add my .jpg file and check the caption output of the algorithm?