How to get appearance feature for video representation with a shape as (16, 2048)

doc-doc / NExT-QA

NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)

MIT License

114 stars 11 forks source link

How to get appearance feature for video representation with a shape as (16, 2048) #2

Closed wangbq18 closed 3 years ago

wangbq18 commented 3 years ago

Hi, I just find the data shape of appearance and motion feature for video representation whitch your provide in the vid_feat.zip is (16, 2048), but With code provided by [HCRN], the appearance feature for video representation with a shape as (8, 16, 2048), So How can I get a appearance feature with shape as (16, 2048). Looking farword to your reply, thanks!

doc-doc commented 3 years ago

Please use the middle frame of the clip for appearance representation.

wangbq18 commented 3 years ago

Please use the middle frame of the clip for appearance representation.

Get it, thanks!

FFFFF123456 commented 2 years ago

Hi, thank you for sharing such a great codebase! Does "the middle frame of the clip" indicate that you use the 8th and 9th frame in each clip(16 frames total) for appearance representation? When concatenate the frame features, do you follow the original order of the 8 clips (e.g. [frames in the first clip: frames in the second clip: ... frames in the last clip]) ? Looking farword to your reply!

doc-doc commented 2 years ago

Hi, there is no need to be so precise, we actually chose the 9th (16//2, from 0) one . Yes, the order should be kept same..

FFFFF123456 commented 2 years ago

Thanks for your reply! It may make sense for me. You chose one frame for each clip, so you set the clip number as 16 rather than 8 as the HCRN?

HU-xiaobai commented 1 year ago

@FFFFF123456 @wangbq18 hello, could I ask if you could abstract the same appearance feature that the author provided? For example, for the validation set, I set the clip as 16 and I abstract the 8th or 9th(actually I compare all the frames in one clip) frame appearance vector of the first clip and compare the first frame vector of the provide appearance vector, both dimension are (2048,), but I find the appearance vector is different. how about yours?

wangbq18 commented 1 year ago

这是来自QQ邮箱的假期自动回复邮件。您好!我已收到您的邮件！