google-research / big_transfer

Official repository for the "Big Transfer (BiT): General Visual Representation Learning" paper.
https://arxiv.org/abs/1912.11370
Apache License 2.0
1.5k stars 175 forks source link

pad then pool in root defintion #74

Closed sgunasekar closed 1 year ago

sgunasekar commented 1 year ago

In the definition of ResNetV2, I noticed the following lines of code image

Wondering what the subtle difference between "pad then then pool" vs "pool with padding option". Also, curious if this does make a difference?

Context: I am trying to debug an issue where I get significantly worse performance compared to batchnorm when i have strides=2 in the root layer rather than strides=1 and curious if this is related.