antoine77340 / howto100m

Code for the HowTo100M paper
Apache License 2.0
251 stars 37 forks source link

how to aggregate nx2048 features into one 2048 feature ? #2

Open dixonhsiao opened 5 years ago

dixonhsiao commented 5 years ago

It seems that in your training/eval data there is only one 2048 2d feature and one 2048 3d feature for a sentence. But using the feature extractor in https://github.com/antoine77340/video_feature_extractor , it seems that there will be nx2048 features for a sentence (if the sentence is n seconds in duration for 2d, and approximately n/1.5 seconds for 3d). How do I aggregate nx2048 features into one 2048 feature as stated in your paper by using temporal max-pooling ? Just select the max value for each dimension ?

bjuncek commented 4 years ago

Yes you can either max pool along the dimensions. For example, you could add nn.AdaptiveMaxPool2d((1, 2048)) after feature loading.