The official code of IterNet.
We propose IterVM, an iterative approach for visual feature extraction which can significantly improve scene text recognition accuracy. IterVM repeatedly uses the high-level visual feature extracted at the previous iteration to enhance the multi-level features extracted at the subsequent iteration.
pip install -r requirements.txt
Note: fastai==1.0.60
is required.
Get the pretrained models from GoogleDrive. Performances of the pretrained models are summaried as follows:
Model | IC13 | SVT | IIIT | IC15 | SVTP | CUTE | AVG |
---|---|---|---|---|---|---|---|
IterNet | 97.9 | 95.1 | 96.9 | 87.7 | 90.9 | 91.3 | 93.8 |
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py --config=configs/pretrain_vm.yaml
CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/pretrain_language_model.yaml
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python main.py --config=configs/train_iternet.yaml
Note:
checkpoint
path for vision model (vm) and language model separately for specific pretrained model, or set to None
to train from scratchCUDA_VISIBLE_DEVICES=0 python main.py --config=configs/train_iternet.yaml --phase test --image_only
Additional flags:
--checkpoint /path/to/checkpoint
set the path of evaluation model --test_root /path/to/dataset
set the path of evaluation dataset--model_eval [alignment|vision]
which sub-model to evaluate--image_only
disable dumping visualization of attention maskspython demo.py --config=configs/train_iternet.yaml --input=figures/demo
Additional flags:
--config /path/to/config
set the path of configuration file --input /path/to/image-directory
set the path of image directory or wildcard path, e.g, --input='figs/test/*.png'
--checkpoint /path/to/checkpoint
set the path of trained model--cuda [-1|0|1|2|3...]
set the cuda id, by default -1 is set and stands for cpu--model_eval [alignment|vision]
which sub-model to use--image_only
disable dumping visualization of attention masksIf you find our method useful for your reserach, please cite
@article{chu2022itervm,
title={IterVM: Iterative Vision Modeling Module for Scene Text Recognition},
author={Chu, Xiaojie and Wang, Yongtao},
journal={26th International Conference on Pattern Recognition (ICPR)},
year={2022}
}
The project is only free for academic research purposes, but needs authorization for commerce. For commerce permission, please contact wyt@pku.edu.cn.
This project is based on ABINet. Thanks for their great works.