Closed soldierofhell closed 5 years ago
Hi,
I had to use batch normalization (BN) for a larger text recognition model, otherwise training would not converge. For this smaller model (SimpleHTR), I was able to train it without BN. BN increases the training time (it makes the model converge, and it theory it should let the model converge faster w.r.t. number of training epochs, but in practice I noticed that the computations done for BN take quite some time and therefore the training time also increased). And from a SW engineering point of view, I have to say that BN looks like a hack in TF: you plug in some node which has to be evaluated separately from the rest of the model ... pretty sure the TF team had their reasons for this, but I don't like the way it is implemented.
However, you can try to add one or two BN layers into the model and see how this influences training. I applied BN in the 3rd and 6th CNN layer. Please let me know how it performs with the SimpleHTR model if you do some experiments.
Closing because of inactivity.
With tf.layers.batch_normalization it's not so complicated, just one has to take care of train_phase. First impression is that it helps but I have to prepare more formal tests
I'm getting over 71% word accuracy in under 20 epochs with batch normalization after every conv layer. Surprisingly, batch normalization is only about a 5% performance hit per epoch but it trains fully in about 25 epochs, so about 50% less time overall. Patch would be about 8 lines of code.
cool, if this really boosts both training time and accuracy, then we should go for it! In case you provide a patch, could you please include the trained model (packed into model.zip)?
Ok, sounds good.
I was fooled by RMSProp being inconsistent. The training time reduction probably exists over a larger sample, but isn't as good as previously advertised (maybe 10-25%). Training ended up terminating with over 73% word accuracy so that's still a huge improvement.
Here's a visualization of batch normalization vs no batch normalization on SimpleHTR. In addition to the benefits of BN, the rate change at 10K batches is pretty noticeable. Interestingly, without BN, the model doesn't overfit to the training data.
thank you, I'll try to have a look at it on the weekend.
Thanks for your work. With python main.py --validate --beamsearch
I now get this output:
Character error rate: 10.464338%. Word accuracy: 74.000000%.
:+1:
Hi Harald,
Thank you for this starter in OCR NN. Is there any reason why you didn't include batch normalization layer between conv2d and relu?
Regards,