Bartzi / see

Code for the AAAI 2018 publication "SEE: Towards Semi-Supervised End-to-End Scene Text Recognition"
GNU General Public License v3.0
574 stars 147 forks source link

I got a problem about STN #17

Open lmolhw5252 opened 6 years ago

lmolhw5252 commented 6 years ago

Hi, I got a problem with STN. I see that STN's params need target dim as 50.Is that means the outputs of STN are feature maps size of [batch_size, channels, output_height, output_width], and both height and width are fixed? What about different size of words in an image? I have not fully understood.

lmolhw5252 commented 6 years ago

Are these output sizes same? image

Bartzi commented 6 years ago

Yes, the output of the spatial transformer sampling mechanism always has the same spatial size. This could be 50x50 or 100x32, depends on what you want to do.

The sampled size is the same in both images you posted here.

lmolhw5252 commented 6 years ago

@Bartzi And I got another qusetion,when I train network with STN, the network is difficult to converge.Do you have any trick about training STN? I use image with 1 word.

Bartzi commented 6 years ago

Yes, there are some tricks you can try:

  1. decrease initial learning rate, you should get good results with learning rates around 1e-4 to 1e-6
  2. if the network starts to converge but then stops, although it is not very good yet, you can try the following: Wait for a model to be saved. Use this snapshot and restart the training. While restarting the training make sure that only the weights of the localization part are loaded and the recognition part is initialized randomly (use --load-localization), this will help the training
  3. I also suggest that you always have a look at the predictions of the model at each training step. Those images will be saved in the log folder for this train run and provide a lot of insights for you!
  4. Just one question about the images you are using: Are the words scattered throughout the whole image? Or are they rather centered? If they are scattered throughout the image, I suggest that you create a train curriculum that starts with images where the words are closer to the center. You can then increase the difficulty by adding the other samples to the training.

Hope this helps ;)

lmolhw5252 commented 6 years ago

Thanks a lot about your reply!