githubharald / SimpleHTR

Handwritten Text Recognition (HTR) system implemented with TensorFlow.
https://towardsdatascience.com/2326a3487cd5
MIT License
1.99k stars 893 forks source link

Recognize Arabic Words using SimpleHTR project #97

Closed Tailor2019 closed 3 years ago

Tailor2019 commented 3 years ago

Hello! @githubharald Thanks a lot for your very helpful project! I added the arabic alphabets to charList.txt. But when I run the project it return this error """model = Model(open(FilePaths.fnCharList).read(), decoderType, mustRestore=True, dump=args.dump) File "/usr/lib/python3.7/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc7 in position 81: invalid continuation byte """ Can you please give some explanation how can I recognize Arabic Words using this project ? Thank you so much!

githubharald commented 3 years ago

you can't just add new characters to the model by adding it to the text file. You will have to get some dataset containing arabic characters, and train the model on that dataset to learn characters. The error you see is caused by some encoding error, please see the Python doc to see how to handle unicode and utf8.

githubharald commented 3 years ago

the character list is created automatically, depending on which chars are in your dataset.