Not able to read the images using tesseract 4.0 please Help

anandpawara commented 5 years ago

I am trying to read the text from the images(attached zip file for images) but it returns wrong result and in many case a blank output. Please suggest what is gone wrong with code. I have also attached code for same

I have installed tesseract.net -Version 4.0.0.16

Thanks in advance

doxakis commented 5 years ago

Hi,

The code doesn't seem to use something from this repo. (e.g. TessBaseAPI)

Anyway, here is my recommendations :

From the wiki of Tesseract: By default Tesseract expects a page of text when it segments an image. If you're just seeking to OCR a small region try a different segmentation mode (https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality#page-segmentation-method)
From the wiki of Tesseract: Tesseract works best on images which have a DPI of at least 300 dpi, so it may be beneficial to resize images. (https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality#rescaling) (In your case, the image is maybe too small)
The text on a page is generally black. (maybe you can invert pixels?)

You may also want to read: https://github.com/tesseract-ocr/tesseract/wiki/ImproveQuality

Thanks!

doxakis commented 5 years ago

Hi, I will close the issue for now. Feel free to open it again.

doxakis / How-to-use-tesseract-ocr-4.0-with-csharp

Not able to read the images using tesseract 4.0 please Help #1