Open divyanshj16 opened 5 years ago
The quoted code snippet doesn't show stride=2. For some part of the network, the downsampling layers are only for subsampled residual connection. The full information is used in the 3x3 layers, while some information might be lost in the residual connections. This design follows the original residual networks for direct comparison.
There is a convolution layer with kernel_size=1 and stride=2. I think its a loss of 50% of the information during processing.
I got this by running this cell of code.