In the definition of ResNetV2, I noticed the following lines of code
Wondering what the subtle difference between "pad then then pool" vs "pool with padding option". Also, curious if this does make a difference?
Context: I am trying to debug an issue where I get significantly worse performance compared to batchnorm when i have strides=2 in the root layer rather than strides=1 and curious if this is related.
In the definition of ResNetV2, I noticed the following lines of code![image](https://user-images.githubusercontent.com/8418631/193730237-f525ace8-d4e0-4002-bb83-2114c26ed2ea.png)
Wondering what the subtle difference between "pad then then pool" vs "pool with padding option". Also, curious if this does make a difference?
Context: I am trying to debug an issue where I get significantly worse performance compared to batchnorm when i have strides=2 in the root layer rather than strides=1 and curious if this is related.