FOTS: Fast Oriented Text Spotting with a Unified Network

I am still working on this repo. updates and detailed instructions are coming soon!

Table of Contens

TensorFlow Versions
Other Requirements
Trained Models
Datasets
Train
- Pre-train with SynthText
- Finetune with ICDAR 2015, ICDAR 2017 MLT or ICDAR 2013
Test
References

TensorFlow Versions

As for now, the pre-training code is tested on TensorFlow 1.12, 1.14 and 1.15. I may try to implement 2.x version in the future.

Other Requirements

GCC >= 6

Trained Models

tmp pre-trained model
trained model comming soon
Datasets
- pre-training
  Synth800k(The dataset is only available for non-commercial research and educational purposes)
- finetuning
  ICDAR 2015, 2017MLT, 2013

Train

Pre-train with SynthText

Download pre-trained ResNet-50 from TensorFlow-Slim image classification model library page and place it at 'ckpt/resnet_v1_50' dir.

cd ckpt/resnet_v1_50
wget http://download.tensorflow.org/models/resnet_v1_50_2016_08_28.tar.gz
tar -zxvf resnet_v1_50_2016_08_28.tar.gz
rm resnet_v1_50_2016_08_28.tar.gz

Download Synth800k dataset and place it at data/SynthText/ dir to pre-train the whole net.
Transform(Pre-process) the SynthText data into the ICDAR data format.
```
python data_provider/SynthText2ICDAR.py
```

Train with SynthText for 10 epochs(with 1 GPU).

python train.py \
--max_steps=715625 \
--gpu_list='0' \
--checkpoint_path=ckpt/synthText_10eps/ \
--pretrained_model_path=ckpt/resnet_v1_50/resnet_v1_50.ckpt \
--training_img_data_dir=data/SynthText/ \
--training_gt_data_dir=data/SynthText/ \
--icdar=False \

Visualize pre-pretraining progress with TensorBoard.
```
tensorboard --logdir=ckpt/synthText_10eps/
```

Finetune with ICDAR 2015, ICDAR 2017 MLT or ICDAR 2013

(if you are using the pre-trained model, place all of the files in ckpt/synthText_10eps/)

Combine ICDAR data before training.
1. Place ICDAR data under tmp/ foler.
2. Run the following script to combine the data.
```
python combine_ICDAR_data.py --year [year of ICDAR to train(13 or 15 or 17)]
```

ICDAR 2017 MLT/pre-finetune for ICDAR 2013 or ICDAR 2015 (text detection task only)

Train the pre-trained model with 9,000 images from ICDAR 2017 MLT training and validation datasets(with 1 GPU).

python train.py \
--gpu_list='0' \
--checkpoint_path=ckpt/ICDAR17MLT/ \
--pretrained_model_path=ckpt/synthText_10eps/ \
--train_stage=0 \
--training_img_data_dir=data/ICDAR17MLT/imgs/ \
--training_gt_data_dir=data/ICDAR17MLT/gts/

ICDAR 2015

Train the model with 1,000 images from ICDAR 2015 training dataset and 229 images from ICDAR 2013 training datasets(with 1 GPU).

python train.py \
--gpu_list='0' \
--checkpoint_path=ckpt/ICDAR15/ \
--pretrained_model_path=ckpt/ICDAR17MLT/ \
--training_img_data_dir=data/ICDAR15+13/imgs/ \
--training_gt_data_dir=data/ICDAR15+13/gts/

ICDAR 2013(horizontal text only)

Train the model with 229 images from ICDAR 2013 training datasets(with 1 GPU).

python train.py \
--gpu_list='0' \
--checkpoint_path=ckpt/ICDAR13/ \
--pretrained_model_path=ckpt/ICDAR17MLT/ \
--training_img_data_dir=data/ICDAR13/imgs/ \
--training_gt_data_dir=data/ICDAR13/gts/

Test

Place some images in test_imgs/ dir and specify a trained checkpoint path to see the test result.

python test.py --test_data_path test_imgs/ --checkpoint_path [checkpoint path]

Masao-Taketani / FOTS_OCR

readme