Open ChillarAnand opened 7 years ago
I think this is from the image having too many broken components. Can you send me the image?
Image quality looks bad. https://www.dropbox.com/s/xdg2w0o6kn4t4hp/page.png?dl=0
The image quality is OK. The program is unable to estimate the line height properly. It is being thrown off by the huge empty space around the text. It works on the zealous cropped image. I thought there was code to detect this may be it is in Chamanti OCR. When you get such error use the following command to see how segmentation is working.
python3 tests/page_test.py sample_images/purugulu_crop.png
There is a function in the class Line
in banti/page.py
called sanity_check()
which will check for such bad cases. It is not being used to throw an error now. I can fix this. But it still remains to see why we are not able to segment this properly.
After reviewing the Fourier Transforms etc. Here is what I found.