facebookresearch / Listen-to-Look

Listen to Look: Action Recognition by Previewing Audio (CVPR 2020)
Creative Commons Attribution 4.0 International
126 stars 15 forks source link

About Parameters N. #5

Closed alice-cool closed 3 years ago

alice-cool commented 3 years ago

Because untrimmed video has arbitrary sizes so I guess the value of N is not static. I found in your supplement materials. You say 16 frames to extract image vector. So I guess the N of every untrimmed video sample can be calculated by dividing 16. And T is static ,you use the T as 10.