Basic HTR concepts/modules to boost performance. Official code for the paper "Best Practices for a Handwritten Text Recognition system" - presented in 15th IAPR International Workshop on Document Analysis Systems (DAS 2022).
05/06/2024:
The whole code was reworked towards a more clear framework, that can be easily modified and adapted with new architectures.
Update Highlights:
You need to have a working version of PyTorch installed. We provide a requirements.txt
file that can be used to install the necessary dependencies for a Python 3.9 setup with CUDA 11.7:
conda create -n htr python=3.9
conda activate htr
pip install -r requirements.txt
This repo contains all the required steps for training and evaluatin on IAM dataset. To access the data, you can register here.
For the line-level setup, which is currently supported by this repo, you only need to download the form images (3sets: data/formsA-D.tgz, data/formsE-H.tgz, data/formsI-Z.tgz - unzip them into a common folder of images) and the xml groundtruth (data/xml.tgz). All these files can be found in the official website after registration.
Then, we can create a line-level instatiation of the dataset through the script:
python prepare_iam.py $mypath$/IAM/forms/ $mypath$/IAM/xml/ ./data/IAM/splits/ ./data/IAM/processed_lines
where $mypath$/IAM is the path where the different IAM files are saved. The splits are provided in the local path ./data. Finally, the last argument is the output folder.
Note that IAM provides already segmented lines, but the provided images have masked-out the background at word-level - while there are some lines missing w.r.t. the xml files. To have a more realistic setup, we extract from the initial forms, the requested lines.
The training is performed as follows:
python trainer.py config.yaml
We can define the gpu to run:
CUDA_VISIBLE_DEVICES=0 python trainer.py config.yaml
or
python trainer.py config.yaml device='cuda:0'
We can change the overall setup directly through config.yaml or as extra arguments to the main python command, as this example:
python trainer.py config.yaml train.lr=1e-3 arch.head_type='both' train.num_epochs=800
One of the main elements of this paper, was the introduction of a shortcut head. This is selected in the above command with the head_type='both' option.
A pre-trained model is provided in saved_models path (htrnet.pt). You can use it, or a re-trained one, to evaluate IAM dataset:
python evaluate.py config.yaml resume=./saved_models//htrnet.pt
Also a single image demo version is available, where an image is selected from the test set:
python demo.py config.yaml resume=./saved_models/htrnet.pt ./data/IAM/processed_lines/test/c04-165-05.png
If you find this work useful, please consider citing:
@inproceedings{retsinas2022best,
title={Best practices for a handwritten text recognition system},
author={Retsinas, George and Sfikas, Giorgos and Gatos, Basilis and Nikou, Christophoros},
booktitle={International Workshop on Document Analysis Systems},
pages={247--259},
year={2022},
organization={Springer}
}