clovaai / deep-text-recognition-benchmark

Text recognition (optical character recognition) with deep learning methods, ICCV 2019
Apache License 2.0
3.77k stars 1.11k forks source link

Questions on datasets "guideline" #327

Open GMXela opened 2 years ago

GMXela commented 2 years ago

Hello guys,

I would like to extract several words on several lines from simple images of approximately the same size.
So I have to create my own Dataset! However, here are the questions I would like to answer with your help please:

Q. For the text file (gt.txt): how can I write the "\n" to make a training dataset? The structure of the file does not seem to allow me to detect on several lines

Q. The characters that can be detected are only alphanumeric. I need to recognize special characters (like : /,*,#, ...). Do I have to train the model for these characters too?

Q. Your code to create a LMDB dataset for training can surely be used to create datasets for Test and for Validation?

Thank you very much guys for your help!!!