Open ghost opened 9 years ago
Oh. Hm. That's not so good. I'll have a look.
Can you tell me what version of OpenCV and Python you are using? And, if possible, could you point me to the test image you are using?
It happens with every image that I've tried it with (JPG and PNG formats). I've tried it with both python 3.4 and python 2,7.
I'm using Ubuntu 14.04 Thing is, the bug only arises whe the program has finished execution.
Also, recognition takes up to 5 secons (I'm using it on more than decent hardware).
Thanks for the module by the way, I really appreciate it!
I installed Ubuntu 14.04 on a VM. I'm using the system packages for OpenCV, NumPy, and Tesseract. (No Python 3 bindings for OpenCV, though.)
I ran against a large set of images using the same code, but couldn't reproduce the bug.
Are you using the system's OpenCV? Did you compile your own?
I compiled my own OpenCV 3.0.0 beta with Python 2.7 and 3.4 bindings :-)
Sincerely Me!
So strange. I built OpenCV-3.0.0-beta on the same system, and again couldn't replicate the bug. imgs/Lets-eat-grandma.jpg
Want to attach a single sample image? I'm otherwise at a loss -- I'm running the same OS, have the same Python versions, compiled OpenCV and TesserPy fresh, and ran against a set of random images I downloaded...
Sure:
I still can't reproduce the bug, even with that sample image.
The code I'm using to test:
import cv2
import tesserpy
import sys
tess = tesserpy.Tesseract("/usr/share/tesseract-ocr/tessdata", language="eng")
tess.tessedit_char_whitelist = "-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"""
image = cv2.imread(sys.argv[1])
tess.set_image(image)
print(tess.get_utf8_text())
Oh, I've also changed the default OCR engine mode (OEM) to OEM_DEFAULT
, which shouldn't really change anything, but seems like a more sensible default.
If you're comparing times to the command line tool, you may want to set your page segmentation mode to match its default:
tess.tessedit_pageseg_mode = tesserpy.PSM_AUTO
I'm not sure otherwise why it's slower, but it seems to be slower inside Tesseract itself; the image reading and decoding takes negligible time. See Issue #10.
Edit: To confirm, Python is v2.76; Tesseract is v3.03; NumPy is 1.8.2; gcc is 4.8.2. These are all system packages. OpenCV is v3.0.0-beta.
This repository has been archived. I recommend you work with a forked version of the project at https://github.com/rigorgt/tesserpy to continue this discussion.
Hey there! Thanks for the module, I've been looking for something like this for my Opencv project, for quite a while!
But whenever I run the following code: import cv2 import tesserpy
tess = tesserpy.Tesseract("/usr/share/tesseract-ocr/tessdata", language="eng")
Anything exposed by SetVariable / GetVariableAsString is an attribute
tess.tessedit_char_whitelist = "-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789""" image = cv2.imread("meme.jpg") tess.set_image(image) page_info = tess.orientation() print(tess.get_utf8_text())
I get the following output:
MEWESEXIDUWE WBEB 1 a m
Lg j w
* Error in `python3': free(): invalid next size (normal): 0x0000000001f4ee80 *
Where the first part: "MEWESEXIDUWE WBEB 1 a m
Lg j w" is the recognized text.
But I can't seem to figure out what causes the error...
Help would be much appreciated!