goncalopp / simple-ocr-opencv

A simple python OCR engine using opencv
GNU Affero General Public License v3.0
524 stars 175 forks source link

how to #2

Closed timprepscius closed 11 years ago

timprepscius commented 11 years ago

This is a really interesting project.

Is there a best way to create the grounding for an image file?

I did this:


git diff
diff --git a/classification.py b/classification.py
index a87a802..08585fc 100644
--- a/classification.py
+++ b/classification.py
@@ -14,7 +14,7 @@ def classes_to_numpy( classes ):
     #utf-32 starts with constant ''\xff\xfe\x00\x00', then has little endian 32 bits chars
     #this assumes little endian architecture!
     assert unichr(15).encode('utf-32')=='\xff\xfe\x00\x00\x0f\x00\x00\x00'
-    int_classes= array.array( "L", "".join(classes).encode('utf-32')[4:])
+    int_classes= array.array( "I", "".join(classes).encode('utf-32')[4:])
     assert len(int_classes) == len(classes)
     classes=  numpy.array( int_classes,  dtype=CLASS_DATATYPE, ndmin=2) #each class in a column. numpy is strange :(
     classes= classes if CLASSES_DIRECTION==1 else numpy.transpose(classes)
diff --git a/example.py b/example.py
index bafffab..5c537e4 100644
--- a/example.py
+++ b/example.py
@@ -1,4 +1,5 @@
 from files import ImageFile
+from grounding import UserGrounder
 from segmentation import ContourSegmenter, draw_segments
 from feature_extraction import SimpleFeatureExtractor
 from classification import KNNClassifier
@@ -11,9 +12,12 @@ ocr= OCR( segmenter, extractor, classifier )

 ocr.train( ImageFile('digits1') )

-test_image= ImageFile('digits2')
+test_image= ImageFile('train')
 test_classes, test_segments= ocr.ocr( test_image, show_steps=True )

+grounder= UserGrounder()
+grounder.ground(test_image, test_segments);
+
 print "accuracy:", accuracy( test_image.ground.classes, test_classes )
 print "OCRed text:\n", reconstruct_chars( test_classes )
 show_differences( test_image.image, test_segments, test_image.ground.classes, test_classes)

But somehow I don't think I should have :-)

How did you create the groundings?

goncalopp commented 11 years ago

Actually, that's pretty much it, if you want to do interactive grounding :) Unfortunately there's no formal documentation on anything ATM, but the UserGrounder docstring should get you going. If you're sure the text segments are correctly detected, you can also supply the ground text as a string to grounding.TextGrounder .

As you may have noticed, grounding.Grounder.ground() calls files.ImageFile.set_ground . By default, that method sets the ground labeling only on the memory (it doesn't write it to disk). For that, you may want to try this:

grounder.ground(test_image, test_segments)
test_image.ground.write()

I hope that clears things up a bit

jakeboydston commented 8 years ago

Hello. I do not understand how to ground a file. I have tried tweaking the code to create segments but cannot do so. I've tried to use the UserGrounding feature because I thought it would be easier. My code looks like this:

testpicture = ImageFile(r"\Users\user\Desktop\words.jpg") testclass, testsegment = ocr.ocr(testpicture,show_steps=True) Jake = UserGrounder() Jake.ground(testpicture,testsegment)

I continue to get this error:

compactness, classified_points, means = cv2.kmeans( data=ys, K=k, bestLabels=None, criteria=(cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_MAX_ITER, 1, 10), attempts=2, flags=cv2.KMEANS_PP_CENTERS) error: ..\..\..\..\opencv\modules\core\src\matrix.cpp:2702: error: (-215) N >= K in function cv::kmeans

Do you have any advice how to go forward? I am a beginner at code but feel that I've been trying the right things.

goncalopp commented 8 years ago

@ jakeboydston That's interesting. I think you're on the right track - I'm not that sure ImageFile takes a path like that, but if it didn't complain it should be ok. Did the segmentation succeed? (you should see rectangles around your characters, IIRC). Can you post the full stack trace and the words.jpg file (or something equivalent that triggers it)?

JoeTurtle commented 8 years ago

Hi there, @goncalopp thanks for sharing this project. I've been playing around with it and just ran into something that's not really clear to me. Could you please explain what the arguments (passed to ContourSegmenter() and SimpleFeatureExtractor()) stand for? segmenter= ContourSegmenter( blur_y=5, blur_x=5, block_size=11, c=10) extractor= SimpleFeatureExtractor( feature_size=10, stretch=False )

Any help would be highly appreciated.

goncalopp commented 8 years ago

Hi JoeTurtle,

The pipeline architecture makes following the parameters harder than it should be, but it's quite simple, actually.

As you see on segmentation.py:

 class ContourSegmenter( FullSegmenter ):
     def __init__(self, **args):
         filters= create_default_filter_stack()
         stack = [BlurProcessor(), RawContourSegmenter()] + filters + [SegmentOrderer()]

a segmenter is a pipeline composed of multiple steps.


blur_y=5, blur_x=5 are parameters for BlurProcessor, which applies a gaussian blur on the image as a pre-processing step. The parameters basically define the size of the gaussian kernel (informally, the "blur amount")


block_size=11, c=10 are parameters for RawContourSegmenter, which feeds it straight into cv2.adaptiveThreshold, so you can read the documentation there - but it's basically parameters for controlling thresholding


feature_size is simply the (square root of) the size of the feature vector to use as input to the classification algorithm. In other words, each potential character is resized to a feature*feature sized (square) image before being fed into the actual learning mechanisms.

stretch controls whether the original character image is stretched or cropped in order to be square.

I hope that helps :) If you can contribute documentation or docstrings while going through the code, I'll gladly accept it