SE_ASTER
Introduction
This is the implementation of the paper "SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition"
This code is based on the aster.pytorch, we sincerely thank ayumiymk for his awesome repo and help.
How to use
Env
PyTorch == 1.1.0
torchvision == 0.3.0
fasttext == 0.9.1
Details can be found in requirements.txt
Train
Prepare your data
- Download the pretrained language model (bin) from here
- Update the path in the lib/tools/create_all_synth_lmdb.py
- Run the lib/tools/create_all_synth_lmdb.py
- Note: it may result in large storage space, you can modify the datasets/dataset.py to generate the word embedding in an online way
Run
- Update the path in train.sh, then
sh train.sh
Test
- Update the path in the test.sh, then
sh test.sh
Experiments
Evaluation on benchmarks
- You can downlod the benchmark datasets from BaiduYun (key: nphk) shared by clovaai in this repo.
Checkpoint |
IIIT5K |
IC13-1015 |
IC13-857 |
IC15-1811 |
IC15-2077 |
SVT |
SVTP |
CUTE |
OneDrive BaiduYun(key: x54e) |
93.4 |
93.5 |
94.5 |
79.8 |
75.8 |
88.4 |
82.0 |
84.0 |
Evalution with lexicons
- Existing methods replace the predicted word with the nearest lexicon word under the metric of edit distance (ED). With the semantic information, we can choose the most semantics similar (SS) word based on the nearest edit distance.
Methods |
IIIT5K-50 |
IIIT5K-1K |
SVT-50 |
IC13 |
IC15 |
ED |
99.06 |
97.87 |
96.36 |
97.44 |
87.76 |
ED + SS |
99.27 |
97.93 |
96.45 |
97.64 |
88.07 |
About the word embedding
- Directly use word embedding from the pre-trained LM during training and inference.
IIIT5K |
IC13 |
IC15-1811 |
IC15-2077 |
SVT |
SVTP |
CUTE |
94.6 |
93.8 |
85.0 |
79.6 |
90.9 |
84.2 |
85.4 |
Exploration on global information
- We try to use Aggregation Cross-Entropy as the global information instead of the semantics. This part of code will be released in next few days.
IIIT5K |
IC13 |
IC15-1811 |
IC15-2077 |
SVT |
SVTP |
CUTE |
93.8 |
91.3 |
78.7 |
- |
90.1 |
81.6 |
81.9 |
Citation
@inproceedings{qiao2020seed,
title={{SEED}: Semantics enhanced encoder-decoder framework for scene text recognition},
author={Qiao, Zhi and Zhou, Yu and Yang, Dongbao and Zhou, Yucan and Wang, Weiping},
booktitle={CVPR},
year={2020},
}
@article{shi2018aster,
title={{ASTER}: An attentional scene text recognizer with flexible rectification},
author={Shi, Baoguang and Yang, Mingkun and Wang, Xinggang and Lyu, Pengyuan and Yao, Cong and Bai, Xiang},
journal={TPAMI},
volume={41},
number={9},
pages={2035--2048},
year={2018},
publisher={IEEE}
}