Why is the first conv layer different from the original ResNet?

Hi, In this line of your code, you use a conv with kernel_size=3, stride=1 and padding=1 as the first conv operation. This model is used for ImageNet training. However, in the original ResNet (and all of PyTorch's ResNet implementations), the first conv is fixed with kernel_size=7, stride=2, and padding=3. Is there any reason for this change? I can't seem to find it mentioned in your PODNet paper, either.

Due to the kernel size and stride in your implementation, the spatial output of the first conv layer is H0 x W0 = 224 x 224, whereas the original ResNet implementation reduces the spatial resolution to 112 x 112. Not only does this use abnormally large amounts of memory (around 3x memory of original implementation on Res18), but also requires much more computation (one training iteration is around 7x slower than original on Res18).

arthurdouillard / incremental_learning.pytorch

Why is the first conv layer different from the original ResNet? #60