amazon-archives / tesserpy

ARCHIVED: A Python API for Tesseract
https://github.com/rigorgt/tesserpy
GNU Lesser General Public License v2.1
20 stars 10 forks source link

Error in `python3': free(): invalid next size (normal): 0x0000000001f4ee80 #9

Open ghost opened 9 years ago

ghost commented 9 years ago

Hey there! Thanks for the module, I've been looking for something like this for my Opencv project, for quite a while!

But whenever I run the following code: import cv2 import tesserpy

tess = tesserpy.Tesseract("/usr/share/tesseract-ocr/tessdata", language="eng")

Anything exposed by SetVariable / GetVariableAsString is an attribute

tess.tessedit_char_whitelist = "-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789""" image = cv2.imread("meme.jpg") tess.set_image(image) page_info = tess.orientation() print(tess.get_utf8_text())

I get the following output:

MEWESEXIDUWE WBEB 1 a m
Lg j w

* Error in `python3': free(): invalid next size (normal): 0x0000000001f4ee80 *

Where the first part: "MEWESEXIDUWE WBEB 1 a m
Lg j w" is the recognized text.

But I can't seem to figure out what causes the error...

Help would be much appreciated!

squidpickles commented 9 years ago

Oh. Hm. That's not so good. I'll have a look.

squidpickles commented 9 years ago

Can you tell me what version of OpenCV and Python you are using? And, if possible, could you point me to the test image you are using?

Younderboy commented 9 years ago

It happens with every image that I've tried it with (JPG and PNG formats). I've tried it with both python 3.4 and python 2,7.

I'm using Ubuntu 14.04 Thing is, the bug only arises whe the program has finished execution.

Also, recognition takes up to 5 secons (I'm using it on more than decent hardware).

Thanks for the module by the way, I really appreciate it!

squidpickles commented 9 years ago

I installed Ubuntu 14.04 on a VM. I'm using the system packages for OpenCV, NumPy, and Tesseract. (No Python 3 bindings for OpenCV, though.)

I ran against a large set of images using the same code, but couldn't reproduce the bug.

Are you using the system's OpenCV? Did you compile your own?

Younderboy commented 9 years ago

I compiled my own OpenCV 3.0.0 beta with Python 2.7 and 3.4 bindings :-)

Sincerely Me!

squidpickles commented 9 years ago

So strange. I built OpenCV-3.0.0-beta on the same system, and again couldn't replicate the bug. imgs/Lets-eat-grandma.jpg

Want to attach a single sample image? I'm otherwise at a loss -- I'm running the same OS, have the same Python versions, compiled OpenCV and TesserPy fresh, and ran against a set of random images I downloaded...

Younderboy commented 9 years ago

Sure: ingredients

squidpickles commented 9 years ago

I still can't reproduce the bug, even with that sample image.

The code I'm using to test:

import cv2
import tesserpy
import sys

tess = tesserpy.Tesseract("/usr/share/tesseract-ocr/tessdata", language="eng")
tess.tessedit_char_whitelist = "-abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"""
image = cv2.imread(sys.argv[1])
tess.set_image(image)
print(tess.get_utf8_text())
squidpickles commented 9 years ago

Oh, I've also changed the default OCR engine mode (OEM) to OEM_DEFAULT, which shouldn't really change anything, but seems like a more sensible default.

If you're comparing times to the command line tool, you may want to set your page segmentation mode to match its default:

tess.tessedit_pageseg_mode = tesserpy.PSM_AUTO

I'm not sure otherwise why it's slower, but it seems to be slower inside Tesseract itself; the image reading and decoding takes negligible time. See Issue #10.

Edit: To confirm, Python is v2.76; Tesseract is v3.03; NumPy is 1.8.2; gcc is 4.8.2. These are all system packages. OpenCV is v3.0.0-beta.

hyandell commented 7 years ago

This repository has been archived. I recommend you work with a forked version of the project at https://github.com/rigorgt/tesserpy to continue this discussion.