georgeretsi / HTR-best-practices

Basic HTR concepts/modules to boost performance
13 stars 3 forks source link
best-practices ctc handwritten-text-recognition htr iam-dataset pytorch text-recognition text-recognition-from-image

HTR-best-practices

Basic HTR concepts/modules to boost performance. Official code for the paper "Best Practices for a Handwritten Text Recognition system" - presented in 15th IAPR International Workshop on Document Analysis Systems (DAS 2022).

Updates

05/06/2024:

The whole code was reworked towards a more clear framework, that can be easily modified and adapted with new architectures.

Update Highlights:

Installation

You need to have a working version of PyTorch installed. We provide a requirements.txt file that can be used to install the necessary dependencies for a Python 3.9 setup with CUDA 11.7:

conda create -n htr python=3.9
conda activate htr
pip install -r requirements.txt

Data Preparation

This repo contains all the required steps for training and evaluatin on IAM dataset. To access the data, you can register here.

For the line-level setup, which is currently supported by this repo, you only need to download the form images (3sets: data/formsA-D.tgz, data/formsE-H.tgz, data/formsI-Z.tgz - unzip them into a common folder of images) and the xml groundtruth (data/xml.tgz). All these files can be found in the official website after registration.

Then, we can create a line-level instatiation of the dataset through the script:

python prepare_iam.py $mypath$/IAM/forms/ $mypath$/IAM/xml/ ./data/IAM/splits/ ./data/IAM/processed_lines

where $mypath$/IAM is the path where the different IAM files are saved. The splits are provided in the local path ./data. Finally, the last argument is the output folder.

Note that IAM provides already segmented lines, but the provided images have masked-out the background at word-level - while there are some lines missing w.r.t. the xml files. To have a more realistic setup, we extract from the initial forms, the requested lines.

Training

The training is performed as follows:

python trainer.py config.yaml

We can define the gpu to run:

CUDA_VISIBLE_DEVICES=0 python trainer.py config.yaml

or

python trainer.py config.yaml device='cuda:0'

We can change the overall setup directly through config.yaml or as extra arguments to the main python command, as this example:

python trainer.py config.yaml train.lr=1e-3 arch.head_type='both' train.num_epochs=800

One of the main elements of this paper, was the introduction of a shortcut head. This is selected in the above command with the head_type='both' option.

Testing

A pre-trained model is provided in saved_models path (htrnet.pt). You can use it, or a re-trained one, to evaluate IAM dataset:

python evaluate.py config.yaml resume=./saved_models//htrnet.pt 

Also a single image demo version is available, where an image is selected from the test set:

python demo.py config.yaml resume=./saved_models/htrnet.pt ./data/IAM/processed_lines/test/c04-165-05.png

Citation

If you find this work useful, please consider citing:

@inproceedings{retsinas2022best,
  title={Best practices for a handwritten text recognition system},
  author={Retsinas, George and Sfikas, Giorgos and Gatos, Basilis and Nikou, Christophoros},
  booktitle={International Workshop on Document Analysis Systems},
  pages={247--259},
  year={2022},
  organization={Springer}
}