Open juparrrr opened 6 years ago
It can't right now, because I don't have the dataset for it. But there is a lot of projects using MNIST dataset for identifying handwritten numbers.
Thank you for your reply. I'm doing a test paper writing recognition system. Before that, I used CNN neural network to train data, but the problem of text segmentation encountered difficulties. So this project needs text segmentation?
It depends on recognition method. In one approach, I test segmentation of text by bidirectional RNN and then classifying individual characters using CNN. I also test classification using CTC which process images of whole words (could be transformed to process whole lines of words).
What are the requirements for Bi-RNN and CNN for data training set? Need a single-letter data set?
Yes, for the CNN you need single-letter dataset. For Bi-RNN you need dataset containing images of whole words along with text files containing positions of lines separating individual letters. If you have the words already, you can use WordClassDM.py for manual creation of letter separating lines.
Before I mentioned the project that I wanted to do, if I use OCR.ipynb to do the recognition, then I need to use two models, the models trained by CharClassifier.ipynb to identify, and the models trained by GapClassifier-BiRNN.ipynb to cut it? ? But both of these are trained from data/words2/ reading data? Now that I have some handwritten letter training sets and word training sets in my hands, can I handle these with WordClassDM.py and do training sets? Can you explain the work done by GapClassifier-BiRNN-Attention.ipynb, GapClassifier-Attention-RNN.ipynb, GapClassification.ipynb, GapClassification-CharClass.ipynb? I'm sorry for delaying your time, but for the first time I did a project on identification, I didn't understand that it was too much. @Breta01
Is the Gap-Classifier used in OCR.ipynb the Classifier-BiRNN.ipynb model?
Ok, yes you need two models and in OCR.ipynb is used the Classifier-BiRNN.ipynb.
I train both models from data/words2/ because it contains images of words along with files which contains the positions of gaplines. For CharClassifier I just cut out those separated letters... If you have letter and word training sets, you can process word set with WordClassDM.py (need manual work) and than train the two models. (If you have individual letters, you can possibly create artificial words for training with already know positions of gaplines, but I don't have code for that.)
GapClassifier-BiRNN-Attention.ipynb, GapClassifier-Attention-RNN.ipynb, GapClassification-CharClass.ipynb, and GapClassifier.ipynb are only experimental models which don't perform so good. Just skip those files.
GapClassification.ipynb deomstrates the process of separation of characters, but final code for separation is in ocr/charSeg.py.
This problem is encountered in running WordClassDM.py:
Traceback (most recent call last):
File "E:/Jupar/handwriting-ocr-master/WordClassDM.py", line 218, in
Process finished with exit code 1 No changes to the words_raw data.
The printProgressBar()
is just for visualisation of loading, you can remove it.
It looks like the -1
shoudn't be there.
Thanks for your help, your program is very good
Hello, can this project identify numbers in handwritten texts?