Closed yoniaflalo closed 5 years ago
Hi @yoniaflalo ,
Both Table 3. and the above Figure are correct. Table 3. shows video-level prediction accuracy*, while the graph shows clip-level prediction accuracy.
For almost all cases, video-level prediction accuracy is significantly higher than clip-level accuracy if the clips are random selected from the long video sequence. This is because a single clip that is randomly selected from the long sequence may not contain enough evidence for making a correct decision or simply does not contain the action.
*Note: People usually do video-level prediction by aggregating(averaging) the prediction from tens or even hundreds of clips/images and can give about 10% boost or even more. However, such multi-crop testing strategy is too expensive to evaluate during training. So, I only shows the clip-level prediction in the Figure.
Thank you for your interests!
Thanks a lot for your answer.
Hi.
I see in the paper that the accuracy on kinetics dataset is 72.8%. As seen in this table
But on the graph below, it seems that the results are presented on the training set and not validation set.
So I wanted to know if I misunderstood something, and if the aforementioned result is on the training set or the validation set. And if you presented the accuracy on the training set, what accuracy on validation set did you get?
Thanks in advance.