githubharald / SimpleHTR

Handwritten Text Recognition (HTR) system implemented with TensorFlow.
https://towardsdatascience.com/2326a3487cd5
MIT License
1.99k stars 893 forks source link

batch normalization #31

Closed soldierofhell closed 5 years ago

soldierofhell commented 5 years ago

Hi Harald,

Thank you for this starter in OCR NN. Is there any reason why you didn't include batch normalization layer between conv2d and relu?

Regards,

githubharald commented 5 years ago

Hi,

I had to use batch normalization (BN) for a larger text recognition model, otherwise training would not converge. For this smaller model (SimpleHTR), I was able to train it without BN. BN increases the training time (it makes the model converge, and it theory it should let the model converge faster w.r.t. number of training epochs, but in practice I noticed that the computations done for BN take quite some time and therefore the training time also increased). And from a SW engineering point of view, I have to say that BN looks like a hack in TF: you plug in some node which has to be evaluated separately from the rest of the model ... pretty sure the TF team had their reasons for this, but I don't like the way it is implemented.

However, you can try to add one or two BN layers into the model and see how this influences training. I applied BN in the 3rd and 6th CNN layer. Please let me know how it performs with the SimpleHTR model if you do some experiments.

githubharald commented 5 years ago

Closing because of inactivity.

soldierofhell commented 5 years ago

With tf.layers.batch_normalization it's not so complicated, just one has to take care of train_phase. First impression is that it helps but I have to prepare more formal tests

Chazzz commented 5 years ago

I'm getting over 71% word accuracy in under 20 epochs with batch normalization after every conv layer. Surprisingly, batch normalization is only about a 5% performance hit per epoch but it trains fully in about 25 epochs, so about 50% less time overall. Patch would be about 8 lines of code.

githubharald commented 5 years ago

cool, if this really boosts both training time and accuracy, then we should go for it! In case you provide a patch, could you please include the trained model (packed into model.zip)?

Chazzz commented 5 years ago

Ok, sounds good.

Chazzz commented 5 years ago

I was fooled by RMSProp being inconsistent. The training time reduction probably exists over a larger sample, but isn't as good as previously advertised (maybe 10-25%). Training ended up terminating with over 73% word accuracy so that's still a huge improvement.

Chazzz commented 5 years ago

Here's a visualization of batch normalization vs no batch normalization on SimpleHTR. In addition to the benefits of BN, the rate change at 10K batches is pretty noticeable. Interestingly, without BN, the model doesn't overfit to the training data.

secret

githubharald commented 5 years ago

thank you, I'll try to have a look at it on the weekend.

githubharald commented 5 years ago

Thanks for your work. With python main.py --validate --beamsearch I now get this output: Character error rate: 10.464338%. Word accuracy: 74.000000%. :+1: