Open cjnolet opened 7 years ago
@cjnolet I think just the proposals without the postprocessing and tinkering with box dimensions leads to poor recall (see image below). @ankush-me can you confirm?
@biggerlambda, you are correct and I have trained both the filtering classifier and the bounding box regressor model. 25% is extremely low recall for the FCRN output which is making me thinking I'm doing something wrong, my network is overfitting, or I'm not training it for long enough.
From the gray curve above, multi-scale FCRN without any other post-processing gets 85% maximum recall on ICDAR. Precision can be improved through non-maximal suppression.
@ankush-me, I understand tyour answer to @biggerlambda's question/suggestion but my original question in this ticket still remains unanswered. Have you ever tried using stochastic pooling? Did you find that it did not help your convolutional layers?
Also, how many epochs did you need to train your SynthText dataset on?
I haven't tried any stochastic pooling.
I think it converged within 4-5 epochs.
(Closed in error -- feel free to comment if not resolved).
I'm training my network now and from epoch to epoch, I'm testing the model to see how well it is in predictions. So far, it's getting me less than 25% recall and I'm trying to figure out what I'm doing wrong. I believe the math I'm doing for converting the pose parameters back to bounding boxes is correct. Anyways, some light research brought me to stochastic pooling and I'm curious if you have ever tried this in your networks and if it's worth me trying to see if maybe my network is overfitting.
Also- how many epochs did you end up running? My epochs are each taking 30 hours with roughly 550k images (I will try a larger set once I am able to see if this approach is going to generalize well, if at all, to my actual data).