e-apostolidis / PGL-SUM

A PyTorch Implementation of PGL-SUM from "Combining Global and Local Attention with Positional Encoding for Video Summarization" (IEEE ISM 2021)
Other
81 stars 32 forks source link

Regarding feature size #9

Closed Youngwoo-git closed 1 year ago

Youngwoo-git commented 1 year ago

After reading through your README, just one thing leaves a little uncertainty.

since GoogleNet's output feature size is 1000, which does not quite match to 1024 which is the feature size defined in this project. So I just wonder how you converted the size 1000 to 1024.

or is it simply removing the last fc layer before things are converted from 1024 to 1000?

Thanks in advance :)

e-apostolidis commented 1 year ago

Hi! Thanks for your interest in our code. Following the paradigm of most video summarization works, we extract deep feature representations for the video frames using the pool5 layer of GoogleNet ('avgpool' or 'AdaptiveAngPool2d' layer) which is of size 1024 (see https://towardsdatascience.com/deep-learning-googlenet-explained-de8861c82765#:~:text=The%20input%20layer%20of%20the,the%20dimension%20224%20x%20224).

Kind regards, Lampis