gitanat / simple-ocr-opencv

A simple python OCR engine using opencv
GNU Affero General Public License v3.0
525 stars 175 forks source link

get the zone detection in the order #35

Closed Jambon1510 closed 5 years ago

Jambon1510 commented 5 years ago

current behavior Some of image which needs to be detected have zone detection appearing in different orders than what we see (from left to right and top to bottom) Let's take this example below p0

The number 3 will arrive in the results before 7 whereas it should appear before 6

OCRed text: q85920q61437

When I try the grounding on this particular picture we can indeed see that 3 detection comes in the same order as the result above

expected behavior Have the below result

OCRed text: q85920q36147

gitanat commented 5 years ago

I'm not sure where the "q" are coming from, but in terms of the order of segments you might have to tune the parameters in SegmentOrderer ({"max_line_height": 20, "max_line_width": 10000})

Can you please attach the code you're using for the OCR and grounding and the commit you're at?

Jambon1510 commented 5 years ago

I tried tuning max_line_height without success

The code I am using is the latest commit (5e29b25) and I am using both example_grounding.py and example.py (I have only renamed the file inside it)

PS: I have just put a 'q' for the unknown char, I have realized later than I should put '<' (I can't see the instruction as I have heaps of below message I still haven't managed to fix


QObject::moveToThread: Current thread (0x1066ec0) is not the object's thread (0x124c8e0).
Cannot move to target thread (0x1066ec0)
```)
gitanat commented 5 years ago

I've used the image you attached in this thread, and the code on 5e29b25a22732c146155ee1f49c255b5e0b5e6ee, and using example_grounding.py and example.py (unchanged except for the image filename)and I'm getting different results:

image

In this case, it mostly works. The 5 and the 2 are getting vertically split, but is likely fixable by increasing blur, or doing a dilate operation.

I'm guessing this is not similar to your case though? What environment are you running this on? (versions for OS, python, opencv and numpy) Do you mind attaching screenshots of your segmentation steps?

Jambon1510 commented 5 years ago

I'm guessing this is not similar to your case though?

Indeed (cf screenshots below)

What environment are you running this on? (versions for OS, python, opencv and numpy)

OS: _lsbrelease -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 18.04.1 LTS Release: 18.04 Codename: bionic

python version module: python3 --version Python 3.6.7

python version module: pip3 freeze | grep -E 'opencv|numpy' numpy==1.15.4 numpydoc==0.7.0 opencv-python==3.4.4.19

Do you mind attaching screenshots of your segmentation steps?

Sure, here you go: screenshot from 2018-12-27 21-37-52 screenshot from 2018-12-27 21-37-49 screenshot from 2018-12-27 21-37-46 screenshot from 2018-12-27 21-37-43 screenshot from 2018-12-27 21-37-41 screenshot from 2018-12-27 21-37-39 screenshot from 2018-12-27 21-37-36 screenshot from 2018-12-27 21-37-33

gitanat commented 5 years ago

Thanks! I'm going to install your Ubuntu version and give it a try In the meantime, if you want to take a look to see if everything is working as intended, you might want to run the unit tests with python setup.py test, and python3 as well

Jambon1510 commented 5 years ago

Thanks a lot for your help, let me know what are your results with similar configuration.

I have one failure as below when running the unit tests

python3 setup.py test
running test
running egg_info
writing simpleocr.egg-info/PKG-INFO
writing dependency_links to simpleocr.egg-info/dependency_links.txt
writing requirements to simpleocr.egg-info/requires.txt
writing top-level names to simpleocr.egg-info/top_level.txt
reading manifest file 'simpleocr.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'data' under directory 'simpleocr'
writing manifest file 'simpleocr.egg-info/SOURCES.txt'
running build_ext
test_ground (tests.test_files.TestImageFile) ... ok
test_ground_unicode (tests.test_files.TestImageFile) ... ok
test_open_image (tests.test_files.TestImageFile) ... ok
test_open_image_nonexistent (tests.test_files.TestImageFile) ... ok
test_terminal_grounder (tests.test_grounding.TestGrounding) ... Found 125 segments to ground.
Type 'exit' to stop grounding the file.
Type ' ' for anything that is not a character.
Grounding will exit automatically after all segments.
Going back to a previous segment is not possible at this time.
ok
test_textgrounder (tests.test_grounding.TestGrounding) ... ok
test_textgrounder_wrong_len (tests.test_grounding.TestGrounding) ... ok
test_usergrounder (tests.test_grounding.TestGrounding) ... For each shown segment, please write the character that it represents, or spacebar if it's not a character. To undo a classification, press backspace. Press ESC when completed, arrow keys to move
showing segment 0 (waiting for input)
showing segment 1 (waiting for input)
showing segment 2 (waiting for input)
showing segment 3 (waiting for input)
showing segment 4 (waiting for input)
showing segment 5 (waiting for input)
showing segment 6 (waiting for input)
showing segment 7 (waiting for input)
showing segment 8 (waiting for input)
showing segment 9 (waiting for input)
showing segment 10 (waiting for input)
showing segment 11 (waiting for input)
showing segment 12 (waiting for input)
showing segment 13 (waiting for input)
showing segment 14 (waiting for input)
showing segment 15 (waiting for input)
showing segment 16 (waiting for input)
showing segment 17 (waiting for input)
showing segment 18 (waiting for input)
showing segment 19 (waiting for input)
showing segment 20 (waiting for input)
showing segment 21 (waiting for input)
showing segment 22 (waiting for input)
showing segment 23 (waiting for input)
showing segment 24 (waiting for input)
showing segment 25 (waiting for input)
showing segment 26 (waiting for input)
showing segment 27 (waiting for input)
showing segment 28 (waiting for input)
showing segment 29 (waiting for input)
showing segment 30 (waiting for input)
showing segment 31 (waiting for input)
showing segment 32 (waiting for input)
showing segment 33 (waiting for input)
showing segment 34 (waiting for input)
showing segment 35 (waiting for input)
showing segment 36 (waiting for input)
showing segment 37 (waiting for input)
showing segment 38 (waiting for input)
showing segment 39 (waiting for input)
showing segment 40 (waiting for input)
showing segment 41 (waiting for input)
showing segment 42 (waiting for input)
showing segment 43 (waiting for input)
showing segment 44 (waiting for input)
showing segment 45 (waiting for input)
showing segment 46 (waiting for input)
showing segment 47 (waiting for input)
showing segment 48 (waiting for input)
showing segment 49 (waiting for input)
showing segment 50 (waiting for input)
showing segment 51 (waiting for input)
showing segment 52 (waiting for input)
showing segment 53 (waiting for input)
showing segment 54 (waiting for input)
showing segment 55 (waiting for input)
showing segment 56 (waiting for input)
showing segment 57 (waiting for input)
showing segment 58 (waiting for input)
showing segment 59 (waiting for input)
showing segment 60 (waiting for input)
showing segment 61 (waiting for input)
showing segment 62 (waiting for input)
showing segment 63 (waiting for input)
showing segment 64 (waiting for input)
showing segment 65 (waiting for input)
showing segment 66 (waiting for input)
showing segment 67 (waiting for input)
showing segment 68 (waiting for input)
showing segment 69 (waiting for input)
showing segment 70 (waiting for input)
showing segment 71 (waiting for input)
showing segment 72 (waiting for input)
showing segment 73 (waiting for input)
showing segment 74 (waiting for input)
showing segment 75 (waiting for input)
showing segment 76 (waiting for input)
showing segment 77 (waiting for input)
showing segment 78 (waiting for input)
showing segment 79 (waiting for input)
showing segment 80 (waiting for input)
showing segment 81 (waiting for input)
showing segment 82 (waiting for input)
showing segment 83 (waiting for input)
showing segment 84 (waiting for input)
showing segment 85 (waiting for input)
showing segment 86 (waiting for input)
showing segment 87 (waiting for input)
showing segment 88 (waiting for input)
showing segment 89 (waiting for input)
showing segment 90 (waiting for input)
showing segment 91 (waiting for input)
showing segment 92 (waiting for input)
showing segment 93 (waiting for input)
showing segment 94 (waiting for input)
showing segment 95 (waiting for input)
showing segment 96 (waiting for input)
showing segment 97 (waiting for input)
showing segment 98 (waiting for input)
showing segment 99 (waiting for input)
showing segment 100 (waiting for input)
showing segment 101 (waiting for input)
showing segment 102 (waiting for input)
showing segment 103 (waiting for input)
showing segment 104 (waiting for input)
showing segment 105 (waiting for input)
showing segment 106 (waiting for input)
showing segment 107 (waiting for input)
showing segment 108 (waiting for input)
showing segment 109 (waiting for input)
showing segment 110 (waiting for input)
showing segment 111 (waiting for input)
showing segment 112 (waiting for input)
showing segment 113 (waiting for input)
showing segment 114 (waiting for input)
showing segment 115 (waiting for input)
showing segment 116 (waiting for input)
showing segment 117 (waiting for input)
showing segment 118 (waiting for input)
showing segment 119 (waiting for input)
showing segment 120 (waiting for input)
showing segment 121 (waiting for input)
showing segment 122 (waiting for input)
showing segment 123 (waiting for input)
showing segment 124 (waiting for input)
showing segment 0 (waiting for input)
classified  125 characters out of 125
ok
test_ocr_digits (tests.test_ocr.TestOCR) ... 314159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706798214808651328230664709384460955058223172535940812848111745028410270193852110555964462294895493038196442881097566593344612847
314159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706798214808651328230664709384460955058223172535940812848111745028410270193852110555964462294895493038196442881097566593344612847
ok
test_ocr_unicode (tests.test_ocr.TestOCR) ... ᚠᛇðþპηγსλσαлльԿմгրяვرეնიىඑயաçයæනγɐດɜຍ我下而身əᕆœ€ᔭ
ᚠᛇðþηγλσαдльгяვეპისயçæɐɜԿրնամرىඑයනດຍ我下而身ᕆᔭəœ€
FAIL
test_opencv_brightness (tests.test_opencv_utils.TestOpenCVUtils) ... ok
test_opencv_brightness_raise (tests.test_opencv_utils.TestOpenCVUtils) ... ok
test_opencv_imageprocesser (tests.test_opencv_utils.TestOpenCVUtils) ... ok

======================================================================
FAIL: test_ocr_unicode (tests.test_ocr.TestOCR)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/mnt/simple-ocr-opencv-master/tests/test_ocr.py", line 31, in test_ocr_unicode
    self._test_ocr(open_image('unicode1'), open_image('unicode1'))
  File "/mnt/simple-ocr-opencv-master/tests/test_ocr.py", line 24, in _test_ocr
    self.assertEqual(chars, reconstruct_chars(ground_truth))
AssertionError: 'ᚠᛇðþპηγსλσαлльԿմгրяვرეնიىඑயաçයæනγɐດɜຍ我下而身əᕆœ€ᔭ' != 'ᚠᛇðþηγλσαдльгяვეპისயçæɐɜԿրնամرىඑයනດຍ我下而身ᕆᔭəœ€'
- ᚠᛇðþპηγსλσαлльԿմгրяვرეնიىඑயաçයæනγɐດɜຍ我下而身əᕆœ€ᔭ
+ ᚠᛇðþηγλσαдльгяვეპისயçæɐɜԿրնամرىඑයනດຍ我下而身ᕆᔭəœ€

----------------------------------------------------------------------
Ran 13 tests in 3.614s

FAILED (failures=1)
Test failed: <unittest.runner.TextTestResult run=13 errors=0 failures=1>
error: Test failed: <unittest.runner.TextTestResult run=13 errors=0 failures=1>
Jambon1510 commented 5 years ago

Hi Any chance to reproduce the issue I am encountering?

gitanat commented 5 years ago

Hi @Jambon1510 Sorry, for the the delay, I'm a bit short on time lately I've reproduced the issue in Ubuntu 18.04 using python2. I get the exact same results as you, for the segmentation and the tests. I haven't had time to dig into the cause yet

Funnily enough, the provided example image (for pi) works well, and the unicode one seems to only have trivial errors. What's the copyright status of your image? Is it something that we could include in this repository to develop more tests and prevent future regressions?

Jambon1510 commented 5 years ago

Thanks a lot for your time Weirdly on my side I can't get the result I had in the first place. I have now below output which is still incorrect:

OCRed text:
 <<3865194207

same if I tried to ground again with 'q' as unknown character as I did before (I will just have a 'q' instead of '<' in the output) Also I check again the versions and they are the same as I put in the comment on the 27 of December 2018.

On my Pi3, same result as above but versions seems to be a bit outdated (I will update and try again)


pi@raspberrypi3 ~ $ lsb_release -a
No LSB modules are available.
Distributor ID: Raspbian
Description:    Raspbian GNU/Linux 9.4 (stretch)
Release:    9.4
Codename:   stretch

pi@raspberrypi3 ~ $ python3 --version
Python 3.5.3

pi@raspberrypi3 ~ $ sudo pip3 freeze | grep -E 'opencv|numpy'
numpy==1.12.1
opencv-python==3.4.4.19

For the picture and to be honest with you it is taken from this website so not sure about the copyright

gitanat commented 5 years ago

So, I was experimenting with this a bit more, and I think the reason the order and line finder are getting confused is just because they're detecting the edges of the background. The algorithm doesn't really do any form of background removal, so you'll have to script that out if you're interested in doing this kind of image.

Using simple thresholding, I get this:

new_image

Which works as you would expect, in of character detection and line detection. I've done this in a image processor, but you can, of course, just do the thresholding in OpenCV. Take a look at this code https://github.com/goncalopp/simple-ocr-opencv/blob/master/simpleocr/segmentation.py#L57

I hope that works for you, let me know how it goes!

Jambon1510 commented 5 years ago

Ok thanks a lot Do you mind helping me to understand how to use your code to apply the image processing? I am trying to use your code by changing in ocr.py

below

 class OCR(object):
     def __init__(self, segmenter=None, extractor=None, classifier=None, grounder=None):
         self.segmenter = get_instance_from(segmenter, SEGMENTERS, "contour")

to

 class OCR(object):
     def __init__(self, segmenter=None, extractor=None, classifier=None, grounder=None):
         self.segmenter = get_instance_from(segmenter, SEGMENTERS, "rawcontour")

but same result, I guess I am doing something wrong

gitanat commented 5 years ago

Are you sure the result is the same?

If you check here: https://github.com/goncalopp/simple-ocr-opencv/blob/8611fc9e1054033b53478186f072497647b6be63/simpleocr/segmentation.py#L82

CountourSegmenter does a few more things than the raw one - blurring the original image and ordering the outputed segments.

In any case, you should do the image processing outside of this project code - on your own program (something similar to example.py), using cv2.adaptiveThreshold . The harder part is extracting the image - I'd advise you to drop into the REPL (you can use import pdb; pdb.set_trace() and inspect the contents of the image object (test_image, for example)

Jambon1510 commented 5 years ago

Ok I have only managed to make it black and white but the order is still not respected. nocontour

I have then decided to proceed to the horizontalization of the image as below trifecta

When grounding I had an issue on line numbers I am not facing anymore for an unknown reason. Anyway I am now able to get this in the order when I try with below example (even if I am not sure that removing contour or applying a contrast filter is necessary with the horizontalization, need to give a try)

hnocontour_p17 Result: OCRed text: 3051792684

Youhou! Thanks a lot for your help

Pre-processing piece of code on horizontalization for those interested (inspired by dermen post)

import cv2
import numpy as np
from PIL import Image
import PIL

image = cv2.imread('p19.png')
image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

image = cv2.adaptiveThreshold(image, maxValue=255, adaptiveMethod=cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
    thresholdType=cv2.THRESH_BINARY, blockSize=11, C=10)

cv2.imwrite("nocontour.png", image)

image = Image.open("nocontour.png")

area = (0,0,173,32)
cropped_img1 = image.crop(area)

area = (0,33,173,65)
cropped_img2 = image.crop(area)

imgs = [ cropped_img1, cropped_img2 ]
# # pick the image which is the smallest, and resize the others to match it (can be arbitrary image shape here)
min_shape = sorted( [(np.sum(i.size), i.size ) for i in imgs])[0][1]
imgs_comb = np.hstack( (np.asarray( i.resize(min_shape) ) for i in imgs ) )

# save that beautiful picture
imgs_comb = PIL.Image.fromarray( imgs_comb)
imgs_comb.save( 'Trifecta.png' ) 
gitanat commented 5 years ago

I think the reason you're getting the wrong order without horizontalization is that your adaptiveThreshold is too conservative - you can still see the squares around the numbers, and ideally those should be gone before you pass the image to the OCR. I'd try and change C until those are gone.

In any case, I'm glad you got it to work! :) I'm closing this ticket for now, but feel free to comment if you need any further help