BackupGGCode / python-tesseract

python wrapper class for tesseract OCR (Linux & Mac & Windows)
3 stars 1 forks source link

No way to work with Boxa * return values? #52

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
This is probably just user error, but I can't find another reasonable place to 
put this so I'll put it in the form of a bug report :) 

API calls including TessBaseAPI.GetWords and TessBaseAPI.GetRegions return a 
Boxa *, which, when printed, looks like so: 

<Swig Object of type 'Boxa *' at 0x2e38a20>

OK, so that's a SwigPyObject subclass.  Somewhere in there is the bounding box 
information I care about.  How am I supposed to get it?  There's no Python 
class defined (that I can find) that implements access to the Boxa on the other 
end of that pointer, or the array of Box that make up the Boxa.  

If there's no way to work with the return value of GetWords, and there's no 
side effects that are interesting, I would submit that it's a bug to have this 
call in the API at all.  However, the only other way to get bounding box 
information is to call GetHOCRText() and parse the HTML that it returns, which 
is icky. 

What am I missing?  

Original issue reported on code.google.com by wsgrib...@gmail.com on 5 Dec 2013 at 7:21

GoogleCodeExporter commented 9 years ago
could u show me a sample script to demonstrate your problem?

Original comment by FreeT...@gmail.com on 7 Dec 2013 at 2:43

GoogleCodeExporter commented 9 years ago
OK, here's the wrapper I use around Tesseract's API: 

-----8<-----

# this is the bare wrapper around Tesseract recognition.  May be used 
# with pages, blocks, lines, or words depending on the mode 
def tess_read_text(image, mode=tesseract.PSM_SINGLE_BLOCK):

    numbers = "0123456789"
    punct = "/.,#$@()*<>:-=%&"
    upper = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    lower = "abcdefghijklmnopqrstuvwxyz"
    api = tesseract.TessBaseAPI()

    api.Init(".", "eng", tesseract.OEM_DEFAULT)
    api.SetVariable("tessedit_char_whitelist", upper + lower + numbers + punct)
    api.SetVariable("assume_fixed_pitch_char_segment", "1")
    api.SetPageSegMode(mode)
    api.ClearAdaptiveClassifier()

    work = cv.copyMakeBorder(image, 30, 30, 30, 30, cv.BORDER_CONSTANT, value=(255,255,255))
    height1, width1 = work.shape
    new_image = cv.cv.CreateImageHeader((width1,height1), cv.IPL_DEPTH_8U, 1)
    cv.cv.SetData(new_image, work.tostring(), work.dtype.itemsize *  (width1))

    tesseract.SetCvImage(new_image, api)

    text = api.GetUTF8Text().strip()
    conf = get_word_confidences(api, text)    # wrapper around AllWordConfidences()

    wordinfo = api.GetWords(None)
    print "text:", text
    print "word confidences:", conf 
    print "word info:", type(wordinfo), wordinfo 

    return (text, zip(text.split(' '), conf))

-----8<-----

And here's some output from that bit of code: 

    text: FULTON
    word confidences: [('FULTON', 83)]
    word info: <type 'SwigPyObject'> <Swig Object of type 'Boxa *' at 0x2357900>

So what am I supposed to do with that SWIG-wrapped Boxa *?  It doesn't have any 
useful methods defined, nor any SwIG helpers I can find.  I'd like to get out 
the box coordinates in the array. 

Original comment by wsgrib...@gmail.com on 10 Dec 2013 at 4:19

GoogleCodeExporter commented 9 years ago
??? conf = get_word_confidences(api, text)    # wrapper around 
AllWordConfidences()

should be 

conf=api.MeanTextConf()

pls refer to example 3

Original comment by FreeT...@gmail.com on 3 May 2014 at 6:22

GoogleCodeExporter commented 9 years ago
MeanTextConf is not a replacement for AllWordConfidences.  In my case it is 
important to know the confidence for each recognized word, not for the entire 
block as an aggregate.

Original comment by wsgrib...@gmail.com on 4 May 2014 at 1:20

GoogleCodeExporter commented 9 years ago
try this

import cv2.cv as cv
import tesseract

api = tesseract.TessBaseAPI()
api.Init(".","eng",tesseract.OEM_DEFAULT)
api.SetPageSegMode(tesseract.PSM_AUTO)

image=cv.LoadImage("eurotext.jpg", cv.CV_LOAD_IMAGE_GRAYSCALE)
tesseract.SetCvImage(image,api)
text=api.GetUTF8Text()
conf=api.MeanTextConf()
print text,len(text)
print "Cofidence Level: %d %%"%conf
print "Confidences of All word"
confOfText=api.AllWordConfidences()
confOfText=tesseract.intArray_frompointer(confOfText)
for i in range(len(text)):
    print i,confOfText[i]

Original comment by FreeT...@gmail.com on 4 May 2014 at 7:22

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
You might need to compile the svn version of python tesseract.

Original comment by FreeT...@gmail.com on 4 May 2014 at 7:26

GoogleCodeExporter commented 9 years ago
Done!

Original comment by FreeT...@gmail.com on 4 May 2014 at 11:49

Attachments:

GoogleCodeExporter commented 9 years ago

Original comment by FreeT...@gmail.com on 9 May 2014 at 7:49