PyTorch re-implementation of Real-time Scene Text Detection with Differentiable Binarization
Use dice loss instead of BCE(binary cross-entropy) loss.
Use normal convolution rather than deformable convolution in the backbone network.
The architecture of the backbone network is a simple FPN.
Have not implement OHEM.
The ground truth of the threshold map is constant 1 rather than 'the distance to the closest segment'.
thanks to these project:
The features are summarized blow:
https://pan.baidu.com/s/1Um0wzbTFjJC0jdJ703GR7Q
or https://mega.nz/#!WdhxXAxT!oGURvmbQFqTHu5hljUPdbDMzI75_UO2iWLaXX5dJrDw
modify genText.py to generate txt list file for training/testing data
modify config.json
run
python train.py
python predict.py
run
python eval.py
[ ] MobileNet backbone
[ ] Deformable convolution
[ ] tensorboard support
[ ] FPN --> Architecture in the thesis
[ ] Dice Loss --> BCE Loss
[ ] threshold map gt use 1 --> threshold map gt use distance (Use 1 will accelerate the label generation)
[ ] OHEM
[ ] OpenCV_DNN inference API for CPU machine
[ ] Caffe version (for deploying with MNN/NCNN)
[ ] ICDAR13 / ICDAR15 / CTW1500 / MLT2017 / Total-Text