arthurdouillard / incremental_learning.pytorch

A collection of incremental learning paper implementations including PODNet (ECCV20) and Ghost (CVPR-W21).
MIT License
388 stars 60 forks source link

Why is the first conv layer different from the original ResNet? #60

Closed numpee closed 2 years ago

numpee commented 2 years ago

Hi, In this line of your code, you use a conv with kernel_size=3, stride=1 and padding=1 as the first conv operation. This model is used for ImageNet training. However, in the original ResNet (and all of PyTorch's ResNet implementations), the first conv is fixed with kernel_size=7, stride=2, and padding=3. Is there any reason for this change? I can't seem to find it mentioned in your PODNet paper, either.

Due to the kernel size and stride in your implementation, the spatial output of the first conv layer is H0 x W0 = 224 x 224, whereas the original ResNet implementation reduces the spatial resolution to 112 x 112. Not only does this use abnormally large amounts of memory (around 3x memory of original implementation on Res18), but also requires much more computation (one training iteration is around 7x slower than original on Res18).

arthurdouillard commented 2 years ago

I've followed the resnet implem of Rebuffi https://github.com/arthurdouillard/incremental_learning.pytorch/blob/0d25c2e12bde4a4a25f81d5e316751c90e6f789b/inclearn/convnet/my_resnet.py#L207.

I recall there was a paper on resnet for cifar that proposed using a smaller kernel size for the first conv as the image is much smaller (32x32 vs 224x224).