OCR - Githubissues

Bryce1010 commented 4 years ago

资源

OCR dataset [github]
awesome-deep-text-detection-recognition [github]
SceneTextPapers [github]
[ ] Scene Text Detection and Recognition: The Deep Learning Era [paper]
Scene Text Recognition [paper with code]
Scene Text Detection [papers with code]
Multi-Oriented Scene Text Detection [papers with code]
Curved Text Detection [papers with code]

综述

Scene Text Detection and Recognition 旷视北大联合公开课 [pdf]
Irregular Text Detection and Recognition (CBDAR2019 keynote) [url]
HCIILAB [github]

Bryce1010 commented 4 years ago

Background

text detection & Recognition的难点和已得到解决的点

文字密集和稀疏
多方向
多语言混合
对于拉丁型文字和非拉丁型文字, 相对于英文,汉字往往会有很长的文本行
检测器往往不能同时对两者做很好的检测结果

针对汉字问题呢, 有关工作提出采用长卷积:

Bryce1010 commented 4 years ago

Scene Text Recognition

场景文字识别可以分为, 字符识别或者文本识别, 字符识别可以采用字符分类器, 而文本识别首先要提取sequence feature, 然后采用RNN生成序列结果:

字符识别的工作主要包含
[1] M. Jaderberg et al. Reading text in the wild with convolutional neural networks. IJCV, 2016.
文本识别的工作主要有 [2] B. Su et al. Accurate scene text recognition based on recurrent neural network. ACCV, 2014. [3] He et al. Reading Scene Text in Deep Convolutional Sequences. AAAI, 2016. [4] Shi B et al. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. TPAMI, 2017.

根据文字的背景与弯曲分为两种任务

Regular Text Recognition

[1] CRNN: Shi B et al. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. TPAMI, 2017.

CRNN网络其实很简单, 首先是采用CNN提取feature, 由于CNN的感受野有限, 不能关注到长文本信息, 所以将feature放入到RNN中, 通过RNN输出text label , 最后通过后处理将RNN输出的text label连城文本行.

这篇文章的优点在于:

第一次实现文本识别端到端的训练
不需要字符级别的标注
lexicon-free

Irregular Text Recognition

[2] RARE: Shi B et al. Robust scene text recognition with automatic rectification. CVPR, 2016. 第一个问题就是, 什么叫做Irregular text呢?

RARE网络结构包含了两部分:
第一部分是STN, 将原来的曲型文字或者透视文字通过STN transform成正常的水平文字;
这个思想与STN相似;

第二部分是SRN, 是一个Encoder-Decoder网络, Encoder是一个CNN+Bi-LSTM 生成在sequence feature, 然后Decoder是Attention + GRU.

Bryce1010 commented 4 years ago

Classic Method

Detection: MSER
Detection: SWT
Recognition: Top-Down and Bottom-Up Cues
Recognition: Tree-Structured Model

Bryce1010 commented 4 years ago

Scene Text Detection

我们将场景文字检测按照方法划分为两个时代:

before 2016

在2016前的文字检测中, 一般采用detection的常用pipeline, proposals - > filtering -> regression

采用这一方法的工作主要有:
[1] Jaderberg et al. Deep features for text spotting. ECCV, 2014. [2] Jaderberg et al. Reading text in the wild with convolutional neural networks. IJCV, 2016. [3] Huang et al. Robust scene text detection with convolution neural network induced mser trees. ECCV, 2014. [4] Zhang et al. Symmetry-based text line detection in natural scenes. CVPPR, 2015. [5] LGómez, D Karatzas. Textproposals: a text-specific selective search algorithm for word spotting in the wild. Pattern Recognition 70, 60-74

after 2016

2016年后, 由于针对inregular text的尝试, 出现了三种主流方法:

segmentation-based method
[1] Zhang Z, et al. Multi-oriented text detection with fully convolutional networks. CVPR, 2016.
proposal-based method [2] Gupta A, et al. Synthetic data for text localisation in natural images. CVPR, 2016.
hybrid method [3] He W, et al. Deep Direct Regression for Multi-Oriented Scene Text Detection. ICCV, 2017

segmentation-based

[2] B. Shi et al. Detecting Oriented Text in Natural Images by Linking Segments. IEEE CVPR, 2017.

不同的网络层, 可以看到采用了不同的detect 尺度, 最后将不同网络层的box都组合起来, 这样就有两种组合方式, 一种是同一网络层, 另一种是不同网络层;

检测Long text 结果

也可以检测曲型文字

proposal-based

[1] M. Liao et al. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. AAAI, 2017. [paper] [caffe]

28 层的全卷积网络, 13层是VGG16, 然后额外添加了9层卷积到VGG16后面, Text-box layers连接了6层卷积层; 每一个map location都输出一个72d的向量, 分别是text presense score (2d) 和 offesets(4d) , 总共输出12个这样的box; 然后采用NMS, aggregate 所有输出的box.

SSD backbone
Long default boxs
Long default kernels

我觉得廖博师兄提出的这个text convolution非常的简单, 但是还是需要一定的ocr基础积累才能察觉到的idea, 不过缺点可能是论文中只尝试了1x5的卷积, 但是没有给出为什么这种卷积就是有效的? 那么1x7, 1x8的呢? vertical 文字检测效果不明显, 同时无法适用于inregular的文字场景.

Hybrid method

EAST: An Efficient and Accurate Scene Text Detector [Zhou et al., CVPR 2017]

Bryce1010 commented 4 years ago

End-to-End Scene Text Detection & Recognition

Bryce1010 commented 4 years ago

Datasets and Evaluation

ICDAR2015 - Incidental Scene Text dataset

Focus on the incidental scene where text may appear in any orientation any location with small size or low resolution.
Includes 1000 training images containing about 4500 readable words and 500 testing images.

MSRA-TD500
Contains 500 natural images taken from indoor and outdoor.
Texts in different languages (Chinese, English or mixture of both), fonts, sizes, colors and orientations.
Annotated with text line bounding box.
Ref. Detecting Texts of Arbitrary Orientations in Natural Images, CVPR12

RCTW-17 dataset
Chinese Text in the Wild(12,034 images, 8034 images for training and 4000 images for testing)
The text annotated in RCTW-17 consists of Chinese characters, digits, and English characters, with Chinese characters taking the largest portion.
ICDAR2017 Competiton on Reading Chinese Scene Text in the Wild (RCTW-17

Bryce1010 commented 4 years ago

Irregular Text Detection

Bounding Box:

proposal-based methods. fail to accurately delimit irregular texts.

Text Mask:

segmentation-based methods. hard to extract text instances from the predicted text areas

Field Representation:

flexible representation. precisely describe irregular texts.

Textfield: Learning a deep direction field for irregular scene text detection [Xu et al., TIP 2019.]

Character-based method

[1] Baek Y, et al. Character Region Awareness for Text Detection. CVPR, 2019.

Polygon-based method

[2] Wang X, et al. Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation. CVPR, 2019.

Segmentation-based methods

[3] Yixing Zhu, et al. TextMountain: Accurate Scene Text Detection via Instance Segmentation. Arxiv, 2018 [4] Shangbang Long, et al. TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes. ECCV, 2018

SegLink++: Detecting dense and arbitrary shaped scene text by instance-aware component grouping, PR2019.

Bryce1010 commented 4 years ago

Irregular Text Recognition

Multi-directional feature-based method

[1] Zhanzhan Cheng , et al. AON: Towards Arbitrarily-Oriented Text Recognition . CVPR, 2018.

Segmentation-based method

[2] Minghui Liao et al. Scene Text Recognition from Two-Dimensional Perspective. AAAI, 2019

Rectification-based method

[3] Baoguang Shi, et al. ASTER: An Attentional Scene Text Recognizer with Flexible Rectification . TPAMI, 2018

Symmetry-constrained Rectification Network for Scene Text Recognition. [Yang et al., ICCV2019]

2D-attention based method

[4] Hui Li, et al, Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition. AAAI, 2019

Bryce1010 commented 4 years ago

Irregular Text Spotting

Instance segmentation

[1] Pengyuan Lyu, et al. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes. ECCV, 2018.

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes. [Liao et al., TPAMI 2019]

Detection & 2d-attention

[2] Siyang Qin, et al. Towards Unconstrained End-to-End Text Spotting . ICCV, 2019

Bryce1010 commented 4 years ago

Future

End-to-end recognition and its evaluation protocol should be the mainstream directions. A very large benchmark dataset like ImageNet including plentiful scenarios should be considered. OCR and NLP should be deeply fused in many real applications.

Bryce1010 / DeepLearning-Project

OCR #9

资源

综述

Background

Scene Text Recognition

Regular Text Recognition

Irregular Text Recognition

Classic Method

Scene Text Detection

before 2016

after 2016

segmentation-based

proposal-based

Hybrid method

End-to-End Scene Text Detection & Recognition

Datasets and Evaluation

ICDAR2015 - Incidental Scene Text dataset

MSRA-TD500

RCTW-17 dataset

Irregular Text Detection

Character-based method

Polygon-based method

Segmentation-based methods

Irregular Text Recognition

Multi-directional feature-based method

Segmentation-based method

Rectification-based method

2D-attention based method

Irregular Text Spotting

Instance segmentation

Detection & 2d-attention

Future