How to train my own dataset to recognize text

balancap / SSD-Tensorflow

Single Shot MultiBox Detector in TensorFlow

4.11k stars 1.89k forks source link

Hi,

I want to use SSD to recognize text in a particular scene. I made the data set as PASCAL. The object does not know where to write the 'text', or the exact text value.

E.g

        <object>
        <name> text </name>
        <pose> Unspecified </pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>92</xmin>
            <ymin>72</ymin>
            <xmax>305</xmax>
            <ymax>473</ymax>
        </bndbox>
    </object>

Or:

        <object>
        <name> DEMO123456789 </name>
        <pose> Unspecified </pose>
        <truncated>0</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>92</xmin>
            <ymin>72</ymin>
            <xmax>305</xmax>
            <ymax>473</ymax>
        </bndbox>
    </object>

End-to-end recognition can directly identify accurate text values. So...I do not know what to choose, because the data set is too time-consuming, so ask the first to do.

Looking forward to your reply. Thanks

<annotation> <folder>STDATA</folder> <filename>0630selection1.JPG</filename> <source> <database>Sense Text Database</database> <annotation>sense text 2017</annotation> <image>flickr</image> <flickrid>001</flickrid> </source> <size> <width>500</width> <height>375</height> <depth>3</depth> </size> <segmented>0</segmented> <object> <name>text</name> <pose>Unspecified</pose> <truncated>0</truncated> <difficult>0</difficult> <bndbox> <xmin>167</xmin> <ymin>21</ymin> <xmax>325</xmax> <ymax>59</ymax> </bndbox> </object> </annotation>

balancap / SSD-Tensorflow

How to train my own dataset to recognize text #55