Open mokiki1 opened 2 years ago
You can directly use the feature, without upsample. The feature extraction process by I3D model would decrease the temporal length by a factor of $\frac{1}{8}$. Given 256 frames, you would obtain 32 features. In the paper, we use frame to indicate a feature vector, which correspond to 8 frames.
Thank you for your answer.
First, how to understand that stride = 4. Secondly, if can, I would like to know the length of the video input into the I3D model when you extract the features of a video.Finally, do you set overlap? thanks.
Hello, thank you very much for sharing. I have some questions about I3D feature extraction. I extract features through the links provided by you. I feel that the dimension of output features is not well understood. Assuming that the scale of the input I3D is [4,3,256,224,224] and the dimension of the final output is [4,1024,31,1,1], in the code you provide, the feature extracted by I3D should get a value in each frame. if its need to upsample?