Closed Dorothylyly closed 4 years ago
Hi, the msvtt dataset actually contains 10k videos, as can be found on http://ms-multimedia-challenge.com/2016/dataset. Each item in the .h5 file is the features or coordinate information of object proposals extracted from the centre frame of the input video.
thank you for your reply!! do you mean that : although msvtt contains 10k videos, you just use 6513 videos of it in your code ??
I'm sorry, but what do you mean by using 6513 videos? The official splits are 6513 in training set, 497 for validation and 2990 for test.
ooooooooooooooooh. It is my mistake, now I know it. thank you !!!!!!!!!!!!!!!!!!!!
I am a novice, I forgot that there are test set and validation set
this two h5 files have 10000 datasets respectively。 you just have 6513 videos ,but why you have 10000 datasets each file ? what is the meaning of each dataset ? is it for frames ?