githubharald / SimpleHTR

Handwritten Text Recognition (HTR) system implemented with TensorFlow.
https://towardsdatascience.com/2326a3487cd5
MIT License
1.99k stars 894 forks source link

preparing own dataset based on IAM dataset #34

Closed androuino closed 5 years ago

androuino commented 5 years ago

Hi,

I have read this article but I'm still kind of lost on how to prepare it. Perhaps an example would be nice or please guide me on how to prepare my dataset, for example, I have labeled some of my dataset and xml files are available, I would like to know how to make the python code works or edit the getNext() function. Thank you so much.

githubharald commented 5 years ago

An example code is already provided in the article, which returns dummy-images containing machine-printed text and the ground-truth texts. All you have to do is return the samples from your own dataset instead - this is really nothing more than reading your data sample by sample (e.g. ground truth from XML and image from PNG file) and returning it in the getNext() method as a tuple. I really don't have the time to give individual help on general coding questions, maybe ask on stackoverflow instead.