A tensorflow implement of CTPN: Detecting Text in Natural Image with Connectionist Text Proposal Network.
Most of code in this project are adapted from CTPN, tf-faster-rcnn and text-detection-ctpn
The result of pretrained model on ICDAR13:
Net | Dataset | Recall | Precision | Hmean |
---|---|---|---|---|
Origin CTPN | ICDAR13 training data + ? | 73.72% | 92.77% | 82.15% |
vgg16 | MLT17 latin/chn + ICDAR13 training data | 74.26% | 82.46% | 78.15% |
If you want an end to end OCR service, check this repo: https://github.com/Sanster/DeepOcrService
Install dependencies:
pip3 install -r requirements.txt
Build Cython part for both demo and training.
cd lib/
make clean
make
Download pre-trained CTPN model(based on vgg16) from google drive, put it in output/vgg16/voc_2007_trainval/default
.
Run
python3 tools/demo.py
This model is trained on 1080Ti with 80k iterations using this commit dc533e030e5431212c1d4dbca0bcd7e594a8a368
.
Download training dataset from google drive.
This dataset contain 3727 images from MLT17(latin+chinese) and ICDAR13 training set.
Ground truth anchors are generated by minAreaRect
of text area, see eragonruan/text-detection-ctpn#issues215 for more details.You can use tools/mlt17_to_voc.py to make your training data.
Put downloaded data in ./data/VOCdevkit2007/VOC2007
Download pre-trained slim vgg16 model from here
Put the pretrained_models in ./data/pretrained_model
Start training
python3 tools/trainval_net.py
The output checkpoint file will be saved at ./output/vgg16/voc_2007_trainval/default
Start tensorboard
tensorboard --logdir=./tensorboard
python3 tools/icdar.py --img_dir=path/to/ICDAR13/Challenge2_Test_Task12_Images/ -c=ICDAR13
After finish, a submit.zip file will generated in data/ICDAR_submit
, than run:
cd tools/ICDAR13
# use python2
python script.py -g=gt.zip -s=submit.zip