githubharald / SimpleHTR

Handwritten Text Recognition (HTR) system implemented with TensorFlow.
https://towardsdatascience.com/2326a3487cd5
MIT License
1.99k stars 893 forks source link

Just asking. Is non-english chars allowed? #123

Closed muhammedcanpirincci-sudo closed 2 years ago

muhammedcanpirincci-sudo commented 2 years ago

Can i add my language's char's to charList.txt and train with my language's dataset? Thanks.

githubharald commented 2 years ago

you have to rewrite the data loader for your dataset and then train the model. the chars file is automatically created. some hints on how to do this see: https://towardsdatascience.com/faq-build-a-handwritten-text-recognition-system-using-tensorflow-27648fb18519

githubharald commented 2 years ago

so you basically have to rewrite this class so that it loads your dataset but still has the same interface as it has now: https://github.com/githubharald/SimpleHTR/blob/master/src/dataloader_iam.py

muhammedcanpirincci-sudo commented 2 years ago

so this part. : 1.2 Create IAM-compatible dataset and train model". I saw it but i just wanted to be sure. Thank you so much for your time sir.

githubharald commented 2 years ago

yes, either make your dataset look like the IAM dataset and use the original dataloader. Or write a new dataloader that loads your dataset and make the dataloader look like the original one.

Whatever you prefer.