KunpengLi1994 / VSRN

PyTorch code for ICCV'19 paper "Visual Semantic Reasoning for Image-Text Matching"
288 stars 47 forks source link

About flickr30k training image number #20

Open LetsGoFir opened 4 years ago

LetsGoFir commented 4 years ago

Great job! I have a small question here, that you said below in your paper

Flickr30K consists of 31783 images collected from the Flickr website. Each image is accompanied with 5 human annotated text descriptions. We use the standard training, validation and testing splits [15], which contain 28,000 images, 1000 im- ages and 1000 images respectively.

But I have download the json provided on https://cs.stanford.edu/people/karpathy/deepimagesent/ and find his flickr training split has 29k images, not 28k images. Maybe this is a typo, I am trying to extract the features so this confused me.

Thanks for your effort!

cvbmi-research commented 2 years ago

I think it is just a typo. Don't need to worry about it too much :)