Masao-Taketani / FOTS_OCR

TensorFlow Implementation of FOTS, Fast Oriented Text Spotting with a Unified Network.
GNU General Public License v3.0
56 stars 15 forks source link
computer-vision deep-learning image-recognition ocr scene-text-recognition tensorflow

FOTS: Fast Oriented Text Spotting with a Unified Network

I am still working on this repo. updates and detailed instructions are coming soon!

Table of Contens

TensorFlow Versions

As for now, the pre-training code is tested on TensorFlow 1.12, 1.14 and 1.15. I may try to implement 2.x version in the future.

Other Requirements

GCC >= 6

Trained Models

Train

Pre-train with SynthText

  1. Download pre-trained ResNet-50 from TensorFlow-Slim image classification model library page and place it at 'ckpt/resnet_v1_50' dir.

    cd ckpt/resnet_v1_50
    wget http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz
    tar -zxvf resnet_v1_50_2016_08_28.tar.gz
    rm resnet_v1_50_2016_08_28.tar.gz
  2. Download Synth800k dataset and place it at data/SynthText/ dir to pre-train the whole net.

  3. Transform(Pre-process) the SynthText data into the ICDAR data format.

    python data_provider/SynthText2ICDAR.py
  4. Train with SynthText for 10 epochs(with 1 GPU).

    python train.py \
    --max_steps=715625 \
    --gpu_list='0' \
    --checkpoint_path=ckpt/synthText_10eps/ \
    --pretrained_model_path=ckpt/resnet_v1_50/resnet_v1_50.ckpt \
    --training_img_data_dir=data/SynthText/ \
    --training_gt_data_dir=data/SynthText/ \
    --icdar=False \
  5. Visualize pre-pretraining progress with TensorBoard.

    tensorboard --logdir=ckpt/synthText_10eps/

Finetune with ICDAR 2015, ICDAR 2017 MLT or ICDAR 2013

(if you are using the pre-trained model, place all of the files in ckpt/synthText_10eps/)

Test

Place some images in test_imgs/ dir and specify a trained checkpoint path to see the test result.

python test.py --test_data_path test_imgs/ --checkpoint_path [checkpoint path]

References