githubharald / SimpleHTR

Handwritten Text Recognition (HTR) system implemented with TensorFlow.
https://towardsdatascience.com/2326a3487cd5
MIT License
1.99k stars 894 forks source link

Train model for Hindi language (devanagari script) #70

Closed PartheshSoni closed 5 years ago

PartheshSoni commented 5 years ago

Hello, you did good work for making this model. I wanted to know how to train your network for language which is not English, like Hindi (Devanagari script). Is the process like the normal one for English or do I need to make changes in the model?

githubharald commented 5 years ago

should not be a problem, even though I never tried it.

PartheshSoni commented 5 years ago

Sir, As you suggested, I have trained the network on Hindi dataset for about 12 hours. Even though Hindi is a more complex language then English, I got an accuracy of 66-67%. What should I do now to improve the accuracy? Should I train it more? Will increasing the number of CNN or RNN layers make any significant improvement in the accuracy? This was just to take your opinion before investing my time in it. I am attaching some images of Hindi words for your reference (To get an idea about the featural complexity of the language, if it helps)

Thanks for reading.

36 35 31 26

githubharald commented 5 years ago

Hi,

thanks for sharing your results :+1:

The SimpleHTR model is a very small neural network, it is easy to increase the accuracy by just making it larger. To get some ideas, you can have a look at the model I used in my thesis (link, see section 3.4). Especially some more CNN layers (e.g. >=7 in total) and changing to 2D LSTM increased the accuracy for me.

Best, Harald

PartheshSoni commented 5 years ago

Ok, I'll check that out. Thanks a lot for your quick reply!