Unable to extract text data smaller images.

Austinn15 commented 5 years ago

test-uppercase1 Hi,

Nice work. I got an issue. What changes I have to make to extract from these kind of images?

Breta01 commented 5 years ago

Hi, it really depends on the data you have. If all of your images looks like this―line of clearly separated uppercase letters. I would try character segmentation using RNN followed by character classifier. Both of these machine learning models will require retraining, but once you can separate the letters (which should be that hard in your case) you can easily find data for training the character classifier. You could even separate the letters using some script using OpenCV functions, but I don't have it.

Austinn15 commented 5 years ago

Hi,

Thanks for the reply. I almost sorted things.results are good. One question Im having is, Word detection is happened succesfully but the detected word is incorrect. How I can improve the detection accuracy?

Thanks and regards, Naveen

On Mon, Dec 31, 2018, 8:49 PM Břetislav Hájek notifications@github.com wrote:

Hi, it really depends on the data you have. If all of your images looks like this―line of clearly separated uppercase letters. I would try character segmentation using RNN followed by character classifier. Both of these machine learning models will require retraining, but once you can separate the letters (which should be that hard in your case) you can easily find data for training the character classifier. You could even separate the letters using some script using OpenCV functions, but I don't have it.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/Breta01/handwriting-ocr/issues/68#issuecomment-450657318, or mute the thread https://github.com/notifications/unsubscribe-auth/AlePPrdrIsvaK4lHBYC5GMFR2MeFAq3tks5u-isFgaJpZM4ZlNsC .

Breta01 commented 5 years ago

Great. Can you describe a little bin in more detail what part of the code are you using so far? You will have to retrain some of the ML models on larger dataset to get better results. Take a look into data/ section and on the training scripts on models you are using.

Breta01 / handwriting-ocr

Unable to extract text data smaller images. #68