VividLe / A2Net

Revisiting Anchor Mechanisms for Temporal Action Localization (TIP 2020)
35 stars 6 forks source link

I3D Feature Extraction #4

Open mokiki1 opened 2 years ago

mokiki1 commented 2 years ago

Hello, thank you very much for sharing. I have some questions about I3D feature extraction. I extract features through the links provided by you. I feel that the dimension of output features is not well understood. Assuming that the scale of the input I3D is [4,3,256,224,224] and the dimension of the final output is [4,1024,31,1,1], in the code you provide, the feature extracted by I3D should get a value in each frame. if its need to upsample?

VividLe commented 2 years ago

You can directly use the feature, without upsample. The feature extraction process by I3D model would decrease the temporal length by a factor of $\frac{1}{8}$. Given 256 frames, you would obtain 32 features. In the paper, we use frame to indicate a feature vector, which correspond to 8 frames.

mokiki1 commented 2 years ago

Thank you for your answer. image

First, how to understand that stride = 4. Secondly, if can, I would like to know the length of the video input into the I3D model when you extract the features of a video.Finally, do you set overlap? thanks.