Because untrimmed video has arbitrary sizes so I guess the value of N is not static. I found in your supplement materials. You say 16 frames to extract image vector. So I guess the N of every untrimmed video sample can be calculated by dividing 16. And T is static ,you use the T as 10.
Because untrimmed video has arbitrary sizes so I guess the value of N is not static. I found in your supplement materials. You say 16 frames to extract image vector. So I guess the N of every untrimmed video sample can be calculated by dividing 16. And T is static ,you use the T as 10.