Open k-sukharev opened 6 months ago
Is changing the default value for conv1_t_stride
to 2
a breaking change? If not, I can make a pull request.
Hi @k-sukharev, thanks for reporting this one. I believe we refer to the implementation here: https://github.com/kenshohara/3D-ResNets-PyTorch/blob/master/models/resnet.py#L110.
In the original ResNet paper, "Deep Residual Learning for Image Recognition", authors propose different model designs for images of different sizes. For smaller input images, like the 32x32 pixels images in the CIFAR10 dataset, the paper suggests setting the stride of the first convolutional layer (conv1) to 1. However, for larger input images, like the 224x224 pixels images in the ImageNet dataset, the stride of conv1 is set to 2 in the original design. This helps to reduce computational consumption, and ensures sufficient receptive field for larger images. For smaller inputs, we might want to set the stride to 1 to preserve more spatial information, while for larger inputs, a stride of 2 reduces computational consumption and increases efficiency. In addition, changing the default stride can indeed have an impact.
cc @Douwe-Spaanderman, as he is the original contributor to this one. cc @ericspod @atbenmurray @Nic-Ma for vis.
@KumoLiu, thanks for the detailed answer.
I agree that conv1_t_stride can be 1 and that's ok.
But this does not change the fact that MedicalNet was trained with conv1_t_stride equal to 2, and monai uses these pre-trained weights for models with conv1_t_stride equal to 1 by default.
During the implementation for ResNetEncoder for FlexibleUNet, I encountered a bug related to the default value for
conv1_t_stride
inResNet.__init__
. Upon investigation, it became evident that the default value should be2
instead of1
.Also, stride for the first convolution is
2
in the MedicalNet repository. So I suppose all 3D pretrained ResNet models for classification at this moment work not as intended.