goncalopp / simple-ocr-opencv

A simple python OCR engine using opencv
GNU Affero General Public License v3.0
524 stars 175 forks source link

OpenCV 4.0 #36

Closed Jambon1510 closed 5 years ago

Jambon1510 commented 5 years ago

Hi Goncalo, any plan to make it compatible with openCV 4.0? I was integrating your code in a new virtualenv which took the last openCV version 4.0 and I came up with below error after tweaking the version check

  File "/mnt/Database/moneygator/simpleocr/ocr.py", line 65, in train
    self.classifier.train(features, image_file.ground.classes)
  File "/mnt/Database/moneygator/simpleocr/classification.py", line 68, in train
    self.knn.train(features, classes)
TypeError: only size-1 arrays can be converted to Python scalars
goncalopp commented 5 years ago

Hi Jambon, Is cv4 the default on your distro at this point, or are you just experimenting? I'm not sure I'll have the time at the moment to setup a new environment, but I'm happy to help you debug the issue. Can you post your OS version, and any changed or manually installed packages?

Running the unit tests would be a good start to pinning down what changed (python setup.py test). From there, if you have a specific failing test, but are still not sure what is causing the problem, you can attach a debugger on the failing line (import pdb; pdb.set_trace()), inspect variables, and keep following the code backwards to see where the difference is compared to a working environment.

In this case it looks like either the interface to knn.train changed and you'll have to do something like _.reshape(len(_), -1) (_ being the offending variable/s), or that some other code should be returning a one-dimentional array and no longer is.

Jambon1510 commented 5 years ago

Hi Goncalo, That's the default when creating a new virtualenv but I guess I can downgrade but have not try it yet.

First run of tests (no modifcations)

python setup.py test

running test
running egg_info
writing simpleocr.egg-info/PKG-INFO
writing dependency_links to simpleocr.egg-info/dependency_links.txt
writing requirements to simpleocr.egg-info/requires.txt
writing top-level names to simpleocr.egg-info/top_level.txt
reading manifest file 'simpleocr.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'data' under directory 'simpleocr'
writing manifest file 'simpleocr.egg-info/SOURCES.txt'
running build_ext
test_ocr_digits (tests.test_ocr.TestOCR) ... ERROR
test_ocr_unicode (tests.test_ocr.TestOCR) ... ERROR
test_ground (tests.test_files.TestImageFile) ... ok
test_ground_unicode (tests.test_files.TestImageFile) ... ok
test_open_image (tests.test_files.TestImageFile) ... ok
test_open_image_nonexistent (tests.test_files.TestImageFile) ... ok
test_grounding (unittest.loader._FailedTest) ... ERROR
test_opencv_brightness (tests.test_opencv_utils.TestOpenCVUtils) ... ok
test_opencv_brightness_raise (tests.test_opencv_utils.TestOpenCVUtils) ... ok
test_opencv_imageprocesser (tests.test_opencv_utils.TestOpenCVUtils) ... ok
tests.test_grounding (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: test_ocr_digits (tests.test_ocr.TestOCR)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jb/Downloads/simple-ocr-opencv-master/tests/test_ocr.py", line 28, in test_ocr_digits
    self._test_ocr(open_image('digits1'), open_image('digits2'))
  File "/home/jb/Downloads/simple-ocr-opencv-master/tests/test_ocr.py", line 17, in _test_ocr
    classifier = KNNClassifier()
  File "/home/jb/Downloads/simple-ocr-opencv-master/simpleocr/classification.py", line 55, in __init__
    self.knn = cv2.KNearest()
AttributeError: module 'cv2.cv2' has no attribute 'KNearest'

======================================================================
ERROR: test_ocr_unicode (tests.test_ocr.TestOCR)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jb/Downloads/simple-ocr-opencv-master/tests/test_ocr.py", line 31, in test_ocr_unicode
    self._test_ocr(open_image('unicode1'), open_image('unicode1'))
  File "/home/jb/Downloads/simple-ocr-opencv-master/tests/test_ocr.py", line 17, in _test_ocr
    classifier = KNNClassifier()
  File "/home/jb/Downloads/simple-ocr-opencv-master/simpleocr/classification.py", line 55, in __init__
    self.knn = cv2.KNearest()
AttributeError: module 'cv2.cv2' has no attribute 'KNearest'

======================================================================
ERROR: test_grounding (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: test_grounding
Traceback (most recent call last):
  File "/usr/lib/python3.6/unittest/loader.py", line 153, in loadTestsFromName
    module = __import__(module_name)
  File "/home/jb/Downloads/simple-ocr-opencv-master/tests/test_grounding.py", line 2, in <module>
    import mock
ModuleNotFoundError: No module named 'mock'

======================================================================
ERROR: tests.test_grounding (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: tests.test_grounding
Traceback (most recent call last):
  File "/usr/lib/python3.6/unittest/loader.py", line 428, in _find_test_path
    module = self._get_module_from_name(name)
  File "/usr/lib/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/home/jb/Downloads/simple-ocr-opencv-master/tests/test_grounding.py", line 2, in <module>
    import mock
ModuleNotFoundError: No module named 'mock'

----------------------------------------------------------------------
Ran 11 tests in 0.409s

FAILED (errors=4)
Test failed: <unittest.runner.TextTestResult run=11 errors=4 failures=0>
error: Test failed: <unittest.runner.TextTestResult run=11 errors=4 failures=0>

Second run of tests (modification in classification.py to the same code whether it is version 3 or other versions)

python setup.py test

running test
running egg_info
writing simpleocr.egg-info/PKG-INFO
writing dependency_links to simpleocr.egg-info/dependency_links.txt
writing requirements to simpleocr.egg-info/requires.txt
writing top-level names to simpleocr.egg-info/top_level.txt
reading manifest file 'simpleocr.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'data' under directory 'simpleocr'
writing manifest file 'simpleocr.egg-info/SOURCES.txt'
running build_ext
test_ocr_digits (tests.test_ocr.TestOCR) ... 314159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706798214808651328230664709384460955058223172535940812848111745028410270193852110555964462294895493038196442881097566593344612847
314159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706798214808651328230664709384460955058223172535940812848111745028410270193852110555964462294895493038196442881097566593344612847
ok
test_ocr_unicode (tests.test_ocr.TestOCR) ... ᚠᛇðþηγλσαлльгяვეპისயçæɐɜԿրնամرىඑයγනດຍ我下而身ᕆᔭəœ€
ᚠᛇðþηγλσαдльгяვეპისயçæɐɜԿրնամرىඑයනດຍ我下而身ᕆᔭəœ€
FAIL
test_ground (tests.test_files.TestImageFile) ... ok
test_ground_unicode (tests.test_files.TestImageFile) ... ok
test_open_image (tests.test_files.TestImageFile) ... ok
test_open_image_nonexistent (tests.test_files.TestImageFile) ... ok
test_grounding (unittest.loader._FailedTest) ... ERROR
test_opencv_brightness (tests.test_opencv_utils.TestOpenCVUtils) ... ok
test_opencv_brightness_raise (tests.test_opencv_utils.TestOpenCVUtils) ... ok
test_opencv_imageprocesser (tests.test_opencv_utils.TestOpenCVUtils) ... ok
tests.test_grounding (unittest.loader._FailedTest) ... ERROR

======================================================================
ERROR: test_grounding (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: test_grounding
Traceback (most recent call last):
  File "/usr/lib/python3.6/unittest/loader.py", line 153, in loadTestsFromName
    module = __import__(module_name)
  File "/home/jb/Downloads/simple-ocr-opencv-master/tests/test_grounding.py", line 2, in <module>
    import mock
ModuleNotFoundError: No module named 'mock'

======================================================================
ERROR: tests.test_grounding (unittest.loader._FailedTest)
----------------------------------------------------------------------
ImportError: Failed to import test module: tests.test_grounding
Traceback (most recent call last):
  File "/usr/lib/python3.6/unittest/loader.py", line 428, in _find_test_path
    module = self._get_module_from_name(name)
  File "/usr/lib/python3.6/unittest/loader.py", line 369, in _get_module_from_name
    __import__(name)
  File "/home/jb/Downloads/simple-ocr-opencv-master/tests/test_grounding.py", line 2, in <module>
    import mock
ModuleNotFoundError: No module named 'mock'

======================================================================
FAIL: test_ocr_unicode (tests.test_ocr.TestOCR)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jb/Downloads/simple-ocr-opencv-master/tests/test_ocr.py", line 31, in test_ocr_unicode
    self._test_ocr(open_image('unicode1'), open_image('unicode1'))
  File "/home/jb/Downloads/simple-ocr-opencv-master/tests/test_ocr.py", line 24, in _test_ocr
    self.assertEqual(chars, reconstruct_chars(ground_truth))
AssertionError: 'ᚠᛇðþηγλσαлльгяვეპისயçæɐɜԿրնամرىඑයγනດຍ我下而身ᕆᔭəœ€' != 'ᚠᛇðþηγλσαдльгяვეპისயçæɐɜԿրնամرىඑයනດຍ我下而身ᕆᔭəœ€'
- ᚠᛇðþηγλσαлльгяვეპისயçæɐɜԿրնամرىඑයγනດຍ我下而身ᕆᔭəœ€
?          ^                       -
+ ᚠᛇðþηγλσαдльгяვეპისயçæɐɜԿրնամرىඑයනດຍ我下而身ᕆᔭəœ€
?          ^

----------------------------------------------------------------------
Ran 11 tests in 0.487s

FAILED (failures=1, errors=2)
Test failed: <unittest.runner.TextTestResult run=11 errors=2 failures=1>
error: Test failed: <unittest.runner.TextTestResult run=11 errors=2 failures=1>
goncalopp commented 5 years ago

Thanks Janbon,

Ideally virtualenv should be hermetic, so specifying a version is probably the right thing to do. Which package are you using, opencv-python from pypi? If so, please note that it's a unofficial precompiled release (see this thread for previous discussion on it)

With that being said, if you want to work on cv4 support, I'd be happy to accept PRs.

You'll have to install the mock package for many of the tests to work. The test_ocr_unicode failure is the same we saw on #35 , it seems to be only trivial errors, we should fix the test to be more robust

Jambon1510 commented 5 years ago

Seems cv4 ok with python setup.py test as well as example.py with classification.py modified

running test
running egg_info
writing simpleocr.egg-info/PKG-INFO
writing dependency_links to simpleocr.egg-info/dependency_links.txt
writing requirements to simpleocr.egg-info/requires.txt
writing top-level names to simpleocr.egg-info/top_level.txt
reading manifest file 'simpleocr.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
warning: no files found matching 'data' under directory 'simpleocr'
writing manifest file 'simpleocr.egg-info/SOURCES.txt'
running build_ext
test_ocr_digits (tests.test_ocr.TestOCR) ... 314159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706798214808651328230664709384460955058223172535940812848111745028410270193852110555964462294895493038196442881097566593344612847
314159265358979323846264338327950288419716939937510582097494459230781640628620899862803482534211706798214808651328230664709384460955058223172535940812848111745028410270193852110555964462294895493038196442881097566593344612847
ok
test_ocr_unicode (tests.test_ocr.TestOCR) ... ᚠᛇðþηγλσαлльгяვეპისயçæɐɜԿրնամرىඑයγනດຍ我下而身ᕆᔭəœ€
ᚠᛇðþηγλσαдльгяვეპისயçæɐɜԿրնամرىඑයනດຍ我下而身ᕆᔭəœ€
FAIL
test_ground (tests.test_files.TestImageFile) ... ok
test_ground_unicode (tests.test_files.TestImageFile) ... ok
test_open_image (tests.test_files.TestImageFile) ... ok
test_open_image_nonexistent (tests.test_files.TestImageFile) ... ok
test_terminal_grounder (tests.test_grounding.TestGrounding) ... Found 125 segments to ground.
Type 'exit' to stop grounding the file.
Type ' ' for anything that is not a character.
Grounding will exit automatically after all segments.
Going back to a previous segment is not possible at this time.
ok
test_textgrounder (tests.test_grounding.TestGrounding) ... ok
test_textgrounder_wrong_len (tests.test_grounding.TestGrounding) ... ok
test_usergrounder (tests.test_grounding.TestGrounding) ... For each shown segment, please write the character that it represents, or spacebar if it's not a character. To undo a classification, press backspace. Press ESC when completed, arrow keys to move
showing segment 0 (waiting for input)
showing segment 1 (waiting for input)
showing segment 2 (waiting for input)
showing segment 3 (waiting for input)
showing segment 4 (waiting for input)
showing segment 5 (waiting for input)
showing segment 6 (waiting for input)
showing segment 7 (waiting for input)
showing segment 8 (waiting for input)
showing segment 9 (waiting for input)
showing segment 10 (waiting for input)
showing segment 11 (waiting for input)
showing segment 12 (waiting for input)
showing segment 13 (waiting for input)
showing segment 14 (waiting for input)
showing segment 15 (waiting for input)
showing segment 16 (waiting for input)
showing segment 17 (waiting for input)
showing segment 18 (waiting for input)
showing segment 19 (waiting for input)
showing segment 20 (waiting for input)
showing segment 21 (waiting for input)
showing segment 22 (waiting for input)
showing segment 23 (waiting for input)
showing segment 24 (waiting for input)
showing segment 25 (waiting for input)
showing segment 26 (waiting for input)
showing segment 27 (waiting for input)
showing segment 28 (waiting for input)
showing segment 29 (waiting for input)
showing segment 30 (waiting for input)
showing segment 31 (waiting for input)
showing segment 32 (waiting for input)
showing segment 33 (waiting for input)
showing segment 34 (waiting for input)
showing segment 35 (waiting for input)
showing segment 36 (waiting for input)
showing segment 37 (waiting for input)
showing segment 38 (waiting for input)
showing segment 39 (waiting for input)
showing segment 40 (waiting for input)
showing segment 41 (waiting for input)
showing segment 42 (waiting for input)
showing segment 43 (waiting for input)
showing segment 44 (waiting for input)
showing segment 45 (waiting for input)
showing segment 46 (waiting for input)
showing segment 47 (waiting for input)
showing segment 48 (waiting for input)
showing segment 49 (waiting for input)
showing segment 50 (waiting for input)
showing segment 51 (waiting for input)
showing segment 52 (waiting for input)
showing segment 53 (waiting for input)
showing segment 54 (waiting for input)
showing segment 55 (waiting for input)
showing segment 56 (waiting for input)
showing segment 57 (waiting for input)
showing segment 58 (waiting for input)
showing segment 59 (waiting for input)
showing segment 60 (waiting for input)
showing segment 61 (waiting for input)
showing segment 62 (waiting for input)
showing segment 63 (waiting for input)
showing segment 64 (waiting for input)
showing segment 65 (waiting for input)
showing segment 66 (waiting for input)
showing segment 67 (waiting for input)
showing segment 68 (waiting for input)
showing segment 69 (waiting for input)
showing segment 70 (waiting for input)
showing segment 71 (waiting for input)
showing segment 72 (waiting for input)
showing segment 73 (waiting for input)
showing segment 74 (waiting for input)
showing segment 75 (waiting for input)
showing segment 76 (waiting for input)
showing segment 77 (waiting for input)
showing segment 78 (waiting for input)
showing segment 79 (waiting for input)
showing segment 80 (waiting for input)
showing segment 81 (waiting for input)
showing segment 82 (waiting for input)
showing segment 83 (waiting for input)
showing segment 84 (waiting for input)
showing segment 85 (waiting for input)
showing segment 86 (waiting for input)
showing segment 87 (waiting for input)
showing segment 88 (waiting for input)
showing segment 89 (waiting for input)
showing segment 90 (waiting for input)
showing segment 91 (waiting for input)
showing segment 92 (waiting for input)
showing segment 93 (waiting for input)
showing segment 94 (waiting for input)
showing segment 95 (waiting for input)
showing segment 96 (waiting for input)
showing segment 97 (waiting for input)
showing segment 98 (waiting for input)
showing segment 99 (waiting for input)
showing segment 100 (waiting for input)
showing segment 101 (waiting for input)
showing segment 102 (waiting for input)
showing segment 103 (waiting for input)
showing segment 104 (waiting for input)
showing segment 105 (waiting for input)
showing segment 106 (waiting for input)
showing segment 107 (waiting for input)
showing segment 108 (waiting for input)
showing segment 109 (waiting for input)
showing segment 110 (waiting for input)
showing segment 111 (waiting for input)
showing segment 112 (waiting for input)
showing segment 113 (waiting for input)
showing segment 114 (waiting for input)
showing segment 115 (waiting for input)
showing segment 116 (waiting for input)
showing segment 117 (waiting for input)
showing segment 118 (waiting for input)
showing segment 119 (waiting for input)
showing segment 120 (waiting for input)
showing segment 121 (waiting for input)
showing segment 122 (waiting for input)
showing segment 123 (waiting for input)
showing segment 124 (waiting for input)
showing segment 0 (waiting for input)
classified  125 characters out of 125
ok
test_opencv_brightness (tests.test_opencv_utils.TestOpenCVUtils) ... ok
test_opencv_brightness_raise (tests.test_opencv_utils.TestOpenCVUtils) ... ok
test_opencv_imageprocesser (tests.test_opencv_utils.TestOpenCVUtils) ... ok

======================================================================
FAIL: test_ocr_unicode (tests.test_ocr.TestOCR)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jb/Downloads/simple-ocr-opencv-master/tests/test_ocr.py", line 31, in test_ocr_unicode
    self._test_ocr(open_image('unicode1'), open_image('unicode1'))
  File "/home/jb/Downloads/simple-ocr-opencv-master/tests/test_ocr.py", line 24, in _test_ocr
    self.assertEqual(chars, reconstruct_chars(ground_truth))
AssertionError: 'ᚠᛇðþηγλσαлльгяვეპისயçæɐɜԿրնամرىඑයγනດຍ我下而身ᕆᔭəœ€' != 'ᚠᛇðþηγλσαдльгяვეპისயçæɐɜԿրնամرىඑයනດຍ我下而身ᕆᔭəœ€'
- ᚠᛇðþηγλσαлльгяვეპისயçæɐɜԿրնամرىඑයγනດຍ我下而身ᕆᔭəœ€
?          ^                       -
+ ᚠᛇðþηγλσαдльгяვეპისயçæɐɜԿրնամرىඑයනດຍ我下而身ᕆᔭəœ€
?          ^

----------------------------------------------------------------------
Ran 13 tests in 2.668s

FAILED (failures=1)
Test failed: <unittest.runner.TextTestResult run=13 errors=0 failures=1>
error: Test failed: <unittest.runner.TextTestResult run=13 errors=0 failures=1>

I will try to do my first PR on classification.py if I find a bit of time. Regarding the code integration I am now facing another issue I need to tackle (will do another ticket if I don't find the solution Exception: 0 segments after filter SmallFilter)

I have used pip install so I guess it is unofficial one, is that correct? If so I didn't get how to install the official one with the referenced post, compiling?

goncalopp commented 5 years ago

Great! We still have the unicode error, but that seems unrelated to cv4.

0 segments after filter SmallFilter probably means that you're trying to classify text that is smaller (pixel size) than the filter is configured to process, you'll have to change the parameters

if you did pip install opencv-python, yes, it's a unnoficial binary. If you want a pre-built binary, using your distribution's is most likely safer (apt install python-opencv). As far as I know, there's no official binary releases for linux from OpenCV, only source releases.

Jambon1510 commented 5 years ago

ok thanks Goncalopp I am trying my first commit pull request now

goncalopp commented 5 years ago

After the PR, I guess we can close this?

Jambon1510 commented 5 years ago

Yes, thanks!