SydCaption / SAAT

MIT License
62 stars 21 forks source link

About the frame features #18

Closed RyanLiut closed 3 years ago

RyanLiut commented 3 years ago

Hi,

Thanks for the paper and the code. I am wondering that when you used IRV2 to extract 2D frame features and selected 28 uniformly-spaced frames each video how do you aggregate these 28 feature vectors into one vector per video? Concatenate and project? Seemingly it is not clear in the paper.

Thank you!

Dorothylyly commented 3 years ago

i have the same question, did you find the answer?

SydCaption commented 3 years ago

Hi, it's also by mean-pooling.

Dorothylyly commented 3 years ago

thanks!! I get it

fsh2017 commented 3 years ago

thanks!! I get it

The shape of NPY is (28,1000), and the shape of H5 is (1536,1). How to conduct mean pooling?