A simple PyTorch framework to train Optical Character Recognition (OCR) models.
You can train models to read captchas, license plates, digital displays, and any type of text!
See:
You have the whole Training Log in a train.log file so you can process it anywhere!
You can also run multiple training runs with Hydra:
python3 train.py --multirun model.use_attention=true,false model.use_ctc=true,false training.num_epochs=50,100
This example will run 8 different trainings with each configuration.
Create a directory called "dataset" and throw your images there (preferable to be png, but you can use other formats as long as you change that)
Your file tree should be like that:
torch-nn-ocr
│ README.md
│ ...
│
└─── dataset
cute.png
motor.png
machine.png
The image name needs to be the content writen in the image. In this case you have one image with 'cute' written in it, other with 'motor' and another with 'machine'.
Your data should be of same length, padding is done automatically if using Attention + CrossEntropy, but padding is not done for CTC Loss, so make sure you normalize your target lengths in case of using CTC Loss (you can do this by adding a character to represent empty space, remember to not use the same as CTC uses for blank, those are different blanks).
Configure your model at configs/config.yaml
model:
use_attention: true
use_ctc: true
dims: 256
Run:
python3 train.py
CRNNs ✅
Attention ✅
CTC Loss ✅
Cross Entropy Loss ✅