Bryce1010 / DeepLearning-Project

7 stars 5 forks source link

OCR #9

Closed Bryce1010 closed 3 years ago

Bryce1010 commented 4 years ago

资源

综述

Bryce1010 commented 4 years ago

Background

text detection & Recognition的难点 和已得到解决的点

针对汉字问题呢, 有关工作提出采用长卷积:

Bryce1010 commented 4 years ago

Scene Text Recognition

场景文字识别可以分为, 字符识别或者文本识别, 字符识别可以采用字符分类器, 而文本识别首先要提取sequence feature, 然后采用RNN生成序列结果:
image

根据文字的背景与弯曲分为两种任务

Regular Text Recognition

[1] CRNN: Shi B et al. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. TPAMI, 2017. image

CRNN网络其实很简单, 首先是采用CNN提取feature, 由于CNN的感受野有限, 不能关注到长文本信息, 所以将feature放入到RNN中, 通过RNN输出text label , 最后通过后处理将RNN输出的text label连城文本行.
image

image 这篇文章的优点在于:

Irregular Text Recognition

[2] RARE: Shi B et al. Robust scene text recognition with automatic rectification. CVPR, 2016. 第一个问题就是, 什么叫做Irregular text呢?
image

RARE网络结构包含了两部分:
第一部分是STN, 将原来的曲型文字或者透视文字通过STN transform成正常的水平文字;
image 这个思想与STN相似;

第二部分是SRN, 是一个Encoder-Decoder网络, Encoder是一个CNN+Bi-LSTM 生成在sequence feature, 然后Decoder是Attention + GRU.
image

Bryce1010 commented 4 years ago

Classic Method

Bryce1010 commented 4 years ago

Scene Text Detection

我们将场景文字检测 按照方法划分为两个时代:

before 2016

在2016前的文字检测中, 一般采用detection的常用pipeline, proposals - > filtering -> regression

image

采用这一方法的工作主要有:
[1] Jaderberg et al. Deep features for text spotting. ECCV, 2014. [2] Jaderberg et al. Reading text in the wild with convolutional neural networks. IJCV, 2016. [3] Huang et al. Robust scene text detection with convolution neural network induced mser trees. ECCV, 2014. [4] Zhang et al. Symmetry-based text line detection in natural scenes. CVPPR, 2015. [5] LGómez, D Karatzas. Textproposals: a text-specific selective search algorithm for word spotting in the wild. Pattern Recognition 70, 60-74

after 2016

2016年后, 由于针对inregular text的尝试, 出现了三种主流方法:

segmentation-based

[2] B. Shi et al. Detecting Oriented Text in Natural Images by Linking Segments. IEEE CVPR, 2017.

image

image 不同的网络层, 可以看到采用了不同的detect 尺度, 最后将不同网络层的box都组合起来, 这样就有两种组合方式, 一种是同一网络层, 另一种是不同网络层;

检测Long text 结果
image

也可以检测曲型文字 image

proposal-based

[1] M. Liao et al. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. AAAI, 2017. [paper] [caffe]

image 28 层的全卷积网络, 13层是VGG16, 然后额外添加了9层卷积到VGG16后面, Text-box layers连接了6层卷积层; 每一个map location都输出一个72d的向量, 分别是text presense score (2d) 和 offesets(4d) , 总共输出12个这样的box; 然后采用NMS, aggregate 所有输出的box.

image

image

我觉得廖博师兄提出的这个text convolution非常的简单, 但是还是需要一定的ocr基础积累才能察觉到的idea, 不过缺点可能是论文中只尝试了1x5的卷积, 但是没有给出为什么这种卷积就是有效的? 那么1x7, 1x8的呢? vertical 文字检测效果不明显, 同时无法适用于inregular的文字场景.

Hybrid method

EAST: An Efficient and Accurate Scene Text Detector [Zhou et al., CVPR 2017]

image

image

Bryce1010 commented 4 years ago

End-to-End Scene Text Detection & Recognition

Bryce1010 commented 4 years ago

Datasets and Evaluation

ICDAR2015 - Incidental Scene Text dataset

Bryce1010 commented 4 years ago

Irregular Text Detection

proposal-based methods. fail to accurately delimit irregular texts.

segmentation-based methods. hard to extract text instances from the predicted text areas

flexible representation. precisely describe irregular texts.

Textfield: Learning a deep direction field for irregular scene text detection [Xu et al., TIP 2019.]

Character-based method

image

[1] Baek Y, et al. Character Region Awareness for Text Detection. CVPR, 2019.

Polygon-based method

image

[2] Wang X, et al. Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation. CVPR, 2019.

Segmentation-based methods

image

[3] Yixing Zhu, et al. TextMountain: Accurate Scene Text Detection via Instance Segmentation. Arxiv, 2018 [4] Shangbang Long, et al. TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes. ECCV, 2018

Bryce1010 commented 4 years ago

Irregular Text Recognition

Multi-directional feature-based method

image

[1] Zhanzhan Cheng , et al. AON: Towards Arbitrarily-Oriented Text Recognition . CVPR, 2018.

Segmentation-based method

image [2] Minghui Liao et al. Scene Text Recognition from Two-Dimensional Perspective. AAAI, 2019

Rectification-based method

image [3] Baoguang Shi, et al. ASTER: An Attentional Scene Text Recognizer with Flexible Rectification . TPAMI, 2018

2D-attention based method

image

[4] Hui Li, et al, Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition. AAAI, 2019

Bryce1010 commented 4 years ago

Irregular Text Spotting

Instance segmentation

[1] Pengyuan Lyu, et al. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes. ECCV, 2018. image

Detection & 2d-attention

[2] Siyang Qin, et al. Towards Unconstrained End-to-End Text Spotting . ICCV, 2019 image

Bryce1010 commented 4 years ago

Future

End-to-end recognition and its evaluation protocol should be the mainstream directions. A very large benchmark dataset like ImageNet including plentiful scenarios should be considered. OCR and NLP should be deeply fused in many real applications.