Closed bryanyzhu closed 4 years ago
Thanks for asking and diving into the details! Long story short, for slowfast arch we don’t need the pooling and we don’t really use the pooling. Pool with kernel of 1 means no pooling.
In the PySlowFast codebase we are aiming to offer a very general model builder that can support various of architectures, not only includes slowonly and slowfast, but also other archectires like C2D and I3D. The pooling is there to support the architecture of C2D and I3D introduces in the Non-Local paper. For slowfast we don’t need the pooling.
Hi, thank you for sharing the great codebase. I am looking at the SlowFast model builder, and find an extra pooling layer between
res2
andres3
stage. I didn't find it in the paper.https://github.com/facebookresearch/SlowFast/blob/master/slowfast/models/video_model_builder.py#L219-L225
In addition, both the kernel size and the strides are
[1,1,1]
, which seems like no pooling is performed. The output has the same shape as the input.Actually, when I read the paper, it says on page 3 that
In our instantiations, we use no temporal downsampling layers (neither temporal pooling nor time-strided convolutions) throughout the Fast pathway, until the global pooling layer before classification.
So this pooling layer shouldn't be here according to the paper, at least for fast pathway. Can you clarify more on this pooling layer, such as why we need it? Thank you.