This repository is not being actively maintained due to lack of time and interest. My sincerest apologies to the open source community for allowing this project to stagnate. I hope it was useful for some of you as a jumping-off point.
The note about the original paper: SSD: Single Shot MultiBox Detector can be found here.
This practice is inspired by ssd-plate_detection
The detail of the above code can read my blog: http://blog.csdn.net/u010167269/article/details/52851667, which was written in chinese.
Meanwhile, I have uploaded my training caffemodel to BaiduYun, Google Drive, Dropbox.
Some examples of the scene text detection:
Currently, I mainly focus on image/video captioning.