How to understand 'clip'?

arunos728 / MotionSqueeze

Official PyTorch Implementation of MotionSqueeze, ECCV 2020

BSD 2-Clause "Simplified" License

139 stars 16 forks source link

Hi,

Thanks for the code sharing of this great work.

I have some questions regarding sec. 4.2, during inference,

Given a video, we sample a clip and test its center crop. For Something- Something V1&V2, we evaluate both the single clip prediction and the average prediction of 10 randomly-sampled clips.

May I ask:

what is a 'clip' here? Is it some set of frames that we can consider to represent the whole video? How many frames does one clip contain?
For something-something dataset, how did you sample one single clip? Is it also randomly-sampled as you did for 10 average prediction?
Why did you use different sampling strategy for something-something (randomly) and kinectics&HMDB-51 (uniformly)? What are the advantages and disadvantages for each?

Your reply would be greatly appreciated.

arunos728 / MotionSqueeze

How to understand 'clip'? #16