facebookresearch / SlowFast

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.
Apache License 2.0
6.56k stars 1.21k forks source link

Question about the pooling layer #11

Closed bryanyzhu closed 4 years ago

bryanyzhu commented 4 years ago

Hi, thank you for sharing the great codebase. I am looking at the SlowFast model builder, and find an extra pooling layer between res2 and res3 stage. I didn't find it in the paper.

https://github.com/facebookresearch/SlowFast/blob/master/slowfast/models/video_model_builder.py#L219-L225

In addition, both the kernel size and the strides are [1,1,1], which seems like no pooling is performed. The output has the same shape as the input.

Actually, when I read the paper, it says on page 3 that

In our instantiations, we use no temporal downsampling layers (neither temporal pooling nor time-strided convolutions) throughout the Fast pathway, until the global pooling layer before classification.

So this pooling layer shouldn't be here according to the paper, at least for fast pathway. Can you clarify more on this pooling layer, such as why we need it? Thank you.

haooooooqi commented 4 years ago

Thanks for asking and diving into the details! Long story short, for slowfast arch we don’t need the pooling and we don’t really use the pooling. Pool with kernel of 1 means no pooling.

In the PySlowFast codebase we are aiming to offer a very general model builder that can support various of architectures, not only includes slowonly and slowfast, but also other archectires like C2D and I3D. The pooling is there to support the architecture of C2D and I3D introduces in the Non-Local paper. For slowfast we don’t need the pooling.