Closed yuu2704 closed 8 months ago
Sorry for the late response. When training, sparse sampling
will split the video into N splits, and randomly sample one frame for each split. When testing, it works as you say.
As for the other ActivityNet
, yes, we use 12 frames as other datasets. I have tried to use more frames but not work.
Thank you for your response. It is very interesting to note that a 3-minute video can be retrieved with only 12 frames, and that it does not work well with more frames.
Thank you for sharing great work.
I understand that sparse sampling means sampling N frames from the entire video at equal intervals. Am I correct in understanding that in this case, even for relatively long video datasets such as ActivityNet-QA and ActivityNet Captions, only 8 to 16 frames from the entire video are sampled and used, just like any other dataset? Sorry to ask something so elementary.
Thanks again for your great work.