ankush-me / SynthText

Code for generating synthetic text images as described in "Synthetic Data for Text Localisation in Natural Images", Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, CVPR 2016.
http://www.robots.ox.ac.uk/~vgg/data/scenetext/
Apache License 2.0
2.03k stars 622 forks source link

Have you tried any type of stochastic pooling for regularization similar to dropout? #23

Open cjnolet opened 7 years ago

cjnolet commented 7 years ago

I'm training my network now and from epoch to epoch, I'm testing the model to see how well it is in predictions. So far, it's getting me less than 25% recall and I'm trying to figure out what I'm doing wrong. I believe the math I'm doing for converting the pose parameters back to bounding boxes is correct. Anyways, some light research brought me to stochastic pooling and I'm curious if you have ever tried this in your networks and if it's worth me trying to see if maybe my network is overfitting.

Also- how many epochs did you end up running? My epochs are each taking 30 hours with roughly 550k images (I will try a larger set once I am able to see if this approach is going to generalize well, if at all, to my actual data).

biggerlambda commented 7 years ago

@cjnolet I think just the proposals without the postprocessing and tinkering with box dimensions leads to poor recall (see image below). @ankush-me can you confirm?

image

cjnolet commented 7 years ago

@biggerlambda, you are correct and I have trained both the filtering classifier and the bounding box regressor model. 25% is extremely low recall for the FCRN output which is making me thinking I'm doing something wrong, my network is overfitting, or I'm not training it for long enough.

ankush-me commented 7 years ago

From the gray curve above, multi-scale FCRN without any other post-processing gets 85% maximum recall on ICDAR. Precision can be improved through non-maximal suppression.

cjnolet commented 7 years ago

@ankush-me, I understand tyour answer to @biggerlambda's question/suggestion but my original question in this ticket still remains unanswered. Have you ever tried using stochastic pooling? Did you find that it did not help your convolutional layers?

Also, how many epochs did you need to train your SynthText dataset on?

ankush-me commented 7 years ago

I haven't tried any stochastic pooling.

I think it converged within 4-5 epochs.

(Closed in error -- feel free to comment if not resolved).