aicaffeinelife / Pytorch-STN

Spatial Transformer Networks in Pytorch.
MIT License
155 stars 33 forks source link

how to use STN into CNN #2

Closed world2025 closed 5 years ago

world2025 commented 5 years ago

pls teach me how to fix the problem, i always meet errors on the dimension,ths

aicaffeinelife commented 5 years ago

STN module requires the input channel and the spatial dimensions. The input channel is the number of feature maps produced by a convolutional layer. If STN is plugged as the first layer of the CNN then the number of input channels will be 3 for a color image dataset like Imagenet, CIFAR etc.

The spatial dim is the height of the input feature map. If the STN module is placed in the beginning then this parameter is the height and width of the input image. For example, if you use this module with ImageNet dataset then it will be 224x224. If however, this module is located in the middle (say after 2 maxpooling layers with a stride of 2) then the spatial dimension will be 56x56.

Keep in mind that the output of the STN module has the same size as the input feature map. And more importantly, STN returns an affine grid map that is a collection of points of interest produced after a spatial transformation of the input feature map.

world2025 commented 5 years ago

ok, thank you very much for your guidance, i will try to realize it .

world2025 commented 5 years ago

ok, i have a question, can the location of STN in CNN take effects on the result ?

aicaffeinelife commented 5 years ago

Yes in my reported experiments STN is used as the first layer, you can technically plug in the module anywhere. This, in theory must impact the overall performance of the network since in the middle layers output of the STN module will be a modified affine grid on a feature map as opposed to input image.

Feel free to submit a pull request if you modify the SVHNet and obtain better/worse results.

aicaffeinelife commented 5 years ago

Closing this issue due to lack of activity. Feel free to reopen it should you need more information.

clelouch commented 4 years ago

STN module requires the input channel and the spatial dimensions. The input channel is the number of feature maps produced by a convolutional layer. If STN is plugged as the first layer of the CNN then the number of input channels will be 3 for a color image dataset like Imagenet, CIFAR etc.

The spatial dim is the height of the input feature map. If the STN module is placed in the beginning then this parameter is the height and width of the input image. For example, if you use this module with ImageNet dataset then it will be 224x224. If however, this module is located in the middle (say after 2 maxpooling layers with a stride of 2) then the spatial dimension will be 56x56.

Keep in mind that the output of the STN module has the same size as the input feature map. And more importantly, STN returns an affine grid map that is a collection of points of interest produced after a spatial transformation of the input feature map.

I guess the shape of the output of STN can be changed by modifing the parameter of F.affine_grid?