Closed liming-ai closed 2 years ago
Yes, we used the same number of frames for testing MSNet on Something V1&V2. Except for the last MSNet model (55.1% / 84.0% on Something V1), we only infer a single clip for testing. You can check the number of clips in Table 1.
Yes, we used the same number of frames for testing MSNet on Something V1&V2. Except for the last MSNet model (55.1% / 84.0% on Something V1), we only infer a single clip for testing. You can check the number of clips in Table 1.
Thanks for your reply, actually I cannot reproduce the accuracy as reported in your paper, the enviroment is same with you, but my resnet18 top1 acc is 44.5% on something-v1, I tried many times but still cannot achieve 46%. Could you please provide some advice? (I did not change anything)
What is the accuracy of your TSM baseline model? (Something-V1)
What is the accuracy of your TSM baseline model? (Something-V1)
44.5% on resnet18 and 48.9% on resnet50
I mean, the accuracy of the TSM baseline without the MS module. You can train the TSM baseline by modifying (flow_estimation = 0 at models.py line 64) If there is no accuracy gap between the TSM baseline and the MSNet, it could be the problem of 'Spatial Correlation Sampler'
Thanks for your contribution.
In your paper there are 8 or 16 frames used for training, but you didn't say how many frames used for testing.
So there is a question that the number of frames in the figure is used for training or used for testing? In other words, is the number of frames used for training and testing should be same?