facebookresearch / Listen-to-Look

Listen to Look: Action Recognition by Previewing Audio (CVPR 2020)
Creative Commons Attribution 4.0 International
126 stars 15 forks source link

About the parameter N. #4

Closed alice-cool closed 3 years ago

alice-cool commented 3 years ago

Hi, scholar. I want to ask you about the untrimmed video test. In your paper, it said z{z1,z2,z3,z4,...zn}. And you didn't use N time steps , you want to extract T time steps to the final classification for one video sample. And I guessed if you every time use the same set z{z1,z2,z3,z4,...zn} to find the most possible useful indexed feature , called as zj, j belongs to [0,N]. And iteratively select T times in the same z{z1,z2,z3,z4,...zn}. And aggreated the T time steps feature to classify the action.