Closed world2025 closed 5 years ago
STN module requires the input channel and the spatial dimensions. The input channel is the number of feature maps produced by a convolutional layer. If STN is plugged as the first layer of the CNN then the number of input channels will be 3
for a color image dataset like Imagenet, CIFAR etc.
The spatial dim
is the height of the input feature map. If the STN module is placed in the beginning then this parameter is the height and width of the input image. For example, if you use this module with ImageNet dataset then it will be 224x224
. If however, this module is located in the middle (say after 2 maxpooling layers with a stride of 2) then the spatial dimension will be 56x56
.
Keep in mind that the output of the STN module has the same size as the input feature map. And more importantly, STN returns an affine grid map that is a collection of points of interest produced after a spatial transformation of the input feature map.
ok, thank you very much for your guidance, i will try to realize it .
ok, i have a question, can the location of STN in CNN take effects on the result ?
Yes in my reported experiments STN is used as the first layer, you can technically plug in the module anywhere. This, in theory must impact the overall performance of the network since in the middle layers output of the STN module will be a modified affine grid on a feature map as opposed to input image.
Feel free to submit a pull request if you modify the SVHNet
and obtain better/worse results.
Closing this issue due to lack of activity. Feel free to reopen it should you need more information.
STN module requires the input channel and the spatial dimensions. The input channel is the number of feature maps produced by a convolutional layer. If STN is plugged as the first layer of the CNN then the number of input channels will be
3
for a color image dataset like Imagenet, CIFAR etc.The
spatial dim
is the height of the input feature map. If the STN module is placed in the beginning then this parameter is the height and width of the input image. For example, if you use this module with ImageNet dataset then it will be224x224
. If however, this module is located in the middle (say after 2 maxpooling layers with a stride of 2) then the spatial dimension will be56x56
.Keep in mind that the output of the STN module has the same size as the input feature map. And more importantly, STN returns an affine grid map that is a collection of points of interest produced after a spatial transformation of the input feature map.
I guess the shape of the output of STN can be changed by modifing the parameter of F.affine_grid?
pls teach me how to fix the problem, i always meet errors on the dimension,ths