Closed Bryce1010 closed 3 years ago
text detection & Recognition的难点 和已得到解决的点
针对汉字问题呢, 有关工作提出采用长卷积:
场景文字识别可以分为, 字符识别或者文本识别, 字符识别可以采用字符分类器, 而文本识别首先要提取sequence feature, 然后采用RNN生成序列结果:
根据文字的背景与弯曲分为两种任务
[1] CRNN: Shi B et al. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. TPAMI, 2017.
CRNN网络其实很简单, 首先是采用CNN提取feature, 由于CNN的感受野有限, 不能关注到长文本信息, 所以将feature放入到RNN中, 通过RNN输出text label , 最后通过后处理将RNN输出的text label连城文本行.
这篇文章的优点在于:
[2] RARE: Shi B et al. Robust scene text recognition with automatic rectification. CVPR, 2016.
第一个问题就是, 什么叫做Irregular text呢?
RARE网络结构包含了两部分:
第一部分是STN, 将原来的曲型文字或者透视文字通过STN transform成正常的水平文字;
这个思想与STN相似;
第二部分是SRN, 是一个Encoder-Decoder网络, Encoder是一个CNN+Bi-LSTM 生成在sequence feature, 然后Decoder是Attention + GRU.
Detection: MSER
Detection: SWT
Recognition: Top-Down and Bottom-Up Cues
Recognition: Tree-Structured Model
我们将场景文字检测 按照方法划分为两个时代:
在2016前的文字检测中, 一般采用detection的常用pipeline, proposals - > filtering -> regression
采用这一方法的工作主要有:
[1] Jaderberg et al. Deep features for text spotting. ECCV, 2014.
[2] Jaderberg et al. Reading text in the wild with convolutional neural networks. IJCV, 2016.
[3] Huang et al. Robust scene text detection with convolution neural network induced mser trees. ECCV, 2014.
[4] Zhang et al. Symmetry-based text line detection in natural scenes. CVPPR, 2015.
[5] LGómez, D Karatzas. Textproposals: a text-specific selective search algorithm for word spotting in the wild. Pattern Recognition 70, 60-74
2016年后, 由于针对inregular text的尝试, 出现了三种主流方法:
[2] B. Shi et al. Detecting Oriented Text in Natural Images by Linking Segments. IEEE CVPR, 2017.
不同的网络层, 可以看到采用了不同的detect 尺度, 最后将不同网络层的box都组合起来, 这样就有两种组合方式, 一种是同一网络层, 另一种是不同网络层;
检测Long text 结果
也可以检测曲型文字
[1] M. Liao et al. TextBoxes: A Fast Text Detector with a Single Deep Neural Network. AAAI, 2017. [paper] [caffe]
28 层的全卷积网络, 13层是VGG16, 然后额外添加了9层卷积到VGG16后面, Text-box layers连接了6层卷积层; 每一个map location都输出一个72d的向量, 分别是text presense score (2d) 和 offesets(4d) , 总共输出12个这样的box; 然后采用NMS, aggregate 所有输出的box.
我觉得廖博师兄提出的这个text convolution非常的简单, 但是还是需要一定的ocr基础积累才能察觉到的idea, 不过缺点可能是论文中只尝试了1x5的卷积, 但是没有给出为什么这种卷积就是有效的? 那么1x7, 1x8的呢? vertical 文字检测效果不明显, 同时无法适用于inregular的文字场景.
EAST: An Efficient and Accurate Scene Text Detector [Zhou et al., CVPR 2017]
Focus on the incidental scene where text may appear in any orientation any location with small size or low resolution.
Includes 1000 training images containing about 4500 readable words and 500 testing images.
Contains 500 natural images taken from indoor and outdoor.
Texts in different languages (Chinese, English or mixture of both), fonts, sizes, colors and orientations.
Annotated with text line bounding box.
Ref. Detecting Texts of Arbitrary Orientations in Natural Images, CVPR12
Chinese Text in the Wild(12,034 images, 8034 images for training and 4000 images for testing)
The text annotated in RCTW-17 consists of Chinese characters, digits, and English characters, with Chinese characters taking the largest portion.
ICDAR2017 Competiton on Reading Chinese Scene Text in the Wild (RCTW-17
proposal-based methods. fail to accurately delimit irregular texts.
segmentation-based methods. hard to extract text instances from the predicted text areas
flexible representation. precisely describe irregular texts.
Textfield: Learning a deep direction field for irregular scene text detection [Xu et al., TIP 2019.]
[1] Baek Y, et al. Character Region Awareness for Text Detection. CVPR, 2019.
[2] Wang X, et al. Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation. CVPR, 2019.
[3] Yixing Zhu, et al. TextMountain: Accurate Scene Text Detection via Instance Segmentation. Arxiv, 2018 [4] Shangbang Long, et al. TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes. ECCV, 2018
[1] Zhanzhan Cheng , et al. AON: Towards Arbitrarily-Oriented Text Recognition . CVPR, 2018.
[2] Minghui Liao et al. Scene Text Recognition from Two-Dimensional Perspective. AAAI, 2019
[3] Baoguang Shi, et al. ASTER: An Attentional Scene Text Recognizer with Flexible Rectification . TPAMI, 2018
[4] Hui Li, et al, Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition. AAAI, 2019
[1] Pengyuan Lyu, et al. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes. ECCV, 2018.
[2] Siyang Qin, et al. Towards Unconstrained End-to-End Text Spotting . ICCV, 2019
End-to-end recognition and its evaluation protocol should be the mainstream directions. A very large benchmark dataset like ImageNet including plentiful scenarios should be considered. OCR and NLP should be deeply fused in many real applications.
资源
OCR dataset [github]
awesome-deep-text-detection-recognition [github]
SceneTextPapers [github]
[ ] Scene Text Detection and Recognition: The Deep Learning Era [paper]
Scene Text Recognition [paper with code]
Scene Text Detection [papers with code]
Multi-Oriented Scene Text Detection [papers with code]
Curved Text Detection [papers with code]
综述