Great job!
I have a small question here, that you said below in your paper
Flickr30K consists of 31783 images collected from the Flickr website. Each image is accompanied with 5 human annotated text descriptions. We use the standard training, validation and testing splits [15], which contain 28,000 images, 1000 im- ages and 1000 images respectively.
But I have download the json provided on
https://cs.stanford.edu/people/karpathy/deepimagesent/
and find his flickr training split has 29k images, not 28k images.
Maybe this is a typo, I am trying to extract the features so this confused me.
Great job! I have a small question here, that you said below in your paper
But I have download the json provided on https://cs.stanford.edu/people/karpathy/deepimagesent/ and find his flickr training split has 29k images, not 28k images. Maybe this is a typo, I am trying to extract the features so this confused me.
Thanks for your effort!