Closed hhimanshu closed 9 years ago
I tried dumping more data and got more confused
// You could retrieve more information about recognized text with that methods:
NSArray *characterBoxes = [tesseract recognizedBlocksByIteratorLevel:G8PageIteratorLevelSymbol];
NSLog(@"characterBoxes:%@", characterBoxes);
NSArray *paragraphs = [tesseract recognizedBlocksByIteratorLevel:G8PageIteratorLevelParagraph];
NSLog(@"paragraphs:%@", paragraphs);
NSArray *characterChoices = tesseract.characterChoices;
NSLog(@"characterChoices:%@", characterChoices);
and the output is
2015-03-01 12:38:04.888 testImage[45600:70b] Text: 13
53 142 11
2015-03-01 12:38:04.889 testImage[45600:70b] characterBoxes:(
"(2.56%) ' '",
"(74.74%) '1'",
"(69.03%) '3'",
"(89.08%) '5'",
"(72.80%) '3'",
"(22.93%) ' '",
"(78.33%) '1'",
"(67.23%) '4'",
"(70.94%) '2'",
"(15.52%) ' '",
"(80.01%) '1'",
"(68.51%) '1'"
)
2015-03-01 12:38:04.890 testImage[45600:70b] paragraphs:(
"(13.67%) ' 13\n53 142 11\n\n'"
)
2015-03-01 12:38:04.890 testImage[45600:70b] characterChoices:(
(
"(2.56%) ' '"
),
(
"(74.74%) '1'"
),
(
"(69.03%) '3'"
),
(
"(89.08%) '5'"
),
(
"(72.80%) '3'"
),
(
"(22.93%) ' '"
),
(
"(78.33%) '1'"
),
(
"(67.23%) '4'"
),
(
"(70.94%) '2'"
),
(
"(5.45%) ' '"
),
(
"(80.01%) '1'"
),
(
"(68.51%) '1'"
)
)
But this is no where giving me the data on the receipt I mentioned. Please help
I think you would have solved this problem yourself had you actually read the comments in the example code you copied. When in doubt, please please please read the comments! They're there for a reason!
Look at these two lines:
// Optional: Limit the character set Tesseract should try to recognize from
tesseract.charWhitelist = @"0123456789";
// Optional: Limit the area of the image Tesseract should recognize on to a rectangle
tesseract.rect = CGRectMake(20, 20, 100, 100);
The first line is restricting the recognition to only recognize the numbers 0-9 because it's setting a whitelist. Any characters not in that whitelist will be ignored.
The second line is restricting the recognition to only recognize in a small window of the Walmart receipt described by the rectangle with origin location (20, 20) with a width of 100 and a height of 100.
If you comment out those two lines and re-run your app, you'll see the following printed out in the Xcode console:
2015-03-01 16:46:01.248 testImage[30569:2175050] Text:Walmart '
Save money. Live better. .
waImart
MANAGER DALE STEWEKT
(501) 328 - 9570
5T# 009% ovx 00001929 TE» 18 TR# 09137
MUNEHKIN CAR CLING SHADES 4.50 T
0 1928372837
INFANTINO INF S-IN-1 CARRIER 30.00 T
FIOgZEERlPRIZésgABY MIRROR 20 00 T
001928372562 '
GERBER CLOTH DIAPER lZ—PK 11.94 T
00992337253
GERBER gNESIES NEWBORN 0’3 7.94 T
GEOOKR7 827375 NEWBORN 0’3 7 94
BURP CLDTHS 4*PK 9-94 T
05347§§2910 '
GERBER gNESIES NEWBORN 0’3 PINK 7_94 T
GEROERSBATENSET 4*PIECE B 24
05716281920, ' T
GERBER SLEEPN PLAY JUMPSUITS ZPK 9_§4
This is the best that Tesseract can do in recognizing text from your image unless you preprocess your image to make it easier for Tesseract to recognize AND/OR create a custom font/language file for Tesseract that you have trained on the font used in these Walmart receipts. Both of these tasks are outside of the scope of this library, but you should be able to search Google for tutorials on training custom language files for Tesseract, and we have a section in our Wiki to assist with ideas for preprocessing images: https://github.com/gali8/Tesseract-OCR-iOS/wiki/Tips-for-Improving-OCR-Results
In fact, if you further comment out the following line:
// Optional: Limit recognition time with a few seconds
tesseract.maximumRecognitionTime = 2.0;
and re-run the app, you get this result which contains more of the text of the receipt:
2015-03-01 16:57:45.981 testImage[33045:2191003] Text:Walma rt '
Save money. Live better. .
waImart
MANAGER DALE STEWEKT
(501) 328 - 9570
5T# 009% ovx 00001929 TE» 18 TR# 09137
MUNEHKIN CAR CLING SHADES 4.50 T
0 1928372837
INFANTINO INF S-IN-1 CARRIER 30.00 T
FIOgZEERlPRIZésgABY MIRROR 20 00 T
001928372562 '
GERBER CLOTH DIAPER lZ—PK 11.94 T
00992337253
GERBER gNESIES NEWBORN 0’3 7.94 T
GEOOKR7 827375 NEWBORN 0’3 7 94
BURP CLDTHS 4*PK 9.94 T
05347§§2910 '
GERBER gNESIES NEWBORN 0’3 PINK 7_94 T
GEROERSBATENSET 4*PIECE B 24
05716281920, ' T
GERBER SLEEPN PLAY JUMPSUITS ZPK 9_§4 T
Gzo‘ééfilsigé‘éfiww auwsum m 9 94 T
“5098788108.. .
55182518§74 '
GERBER INFANT GOWNS 2-PK 8.24 T
ag‘é‘éfifiséfiflifi <30sz m a 24
00983726362 ' T
FADED GLORY NEWBORN BODYSUIT 2.00 T
FAggDZ7E7gzlewBORN BODVSUIT z oo
FAOO71§923921EWBORN BODYSUIT 2'00 T
8593828§§3§ - T
FADED GLORY NEWBORN BODYSUIT 2.00 T
FAgggzgstRglaEb/BDRN PANTS 2 00 T
00710239392 '
FADED GLORY NEWBORN PANTS 2.00 T
FAggggégzngsaEWBORN PANTS 2 00 T
00774932929 '
FADED GLORY NEWBORN PANTS 2.00 T
00719283920
GARANIMALS TURTLE VIBRATE TOY 5_OO T
angfik‘ifliizgfifinn mm 2 00 T
00183923839 '
GARANIMALS CHIME ALONG 4.00 T
INOng'TIEEEEBST RATTLES 3 00 T
00182733938 '
GRACO DIGITAL MONITOR Z UNITS 60.00 T
0500232 339220815315 1 00 T
55.3% .55.; -
OELTA 1 *PK HANGERS 1.00 T
00928392398
DELTA 10*PK HANGERS 1.00 T
00900839283
DELTA 1 *PK HANGERS 1.00 T
00928932983
CHILD g MINE SLEEP&PLAY JUMPER 7.00 T
0054 232399
(HILD g MINE SLEEP&PLAY JUMPER 7.00 T
0059 398329
(HILD O MINE SLEEP&PLAY JUMPER 7.00 T
0058983R987
CHILD g MINE SLEEP&PLAY JUMPER 7.00 T
0059 379384
CHILD 0 MINE SLEEP&PLAY JUMPER 7.00 T
CH99D27$3EIZE SLEEP&PLAY JUMPER 7 ()0
00593374928 ' T
CHILD O MINE DRESS SET 5.75 T
04002159830343 DRESS SET 5 75 T
DRESS SET 5-75
0029g734923 ' T
FADED GLORY NEWBORN DRESS SET 10.00 T
6083E29g82274P0KEY PUPPY 3 99 T
00289823738 '
GOLDEN BOOKS TURTLE SHELL 3.99 T
00919832409
SUBTOTAL 24 .47
TAX 1 8.371% 1 .84
TOTAL 25 . 1
DEBIT TEND 260.31
CHANDE DhE 0.00
EFT DEBIT PAY FROM er RY
AESOgNT : {8‘83
2 . 1 TOTAL PURCHASE
02/1252011’ 15:22:32
# ITEMS SOLD 41
TE! 2127 9170 9490 5255 INS
‘mm"mmmmmmmmmmmmmmmmmmmWWWHWmm
Iax Prup in stnv! It Jacksnn Hauitt
Ind 83 thick [lihiflfi at Ualnurt
This is because this line limits how long Tesseract can spend recognizing on the image, so by commenting it out, you let Tesseract take as long as it needs to.
Thanks a lot for your help, very much appreciated
Hey @dlinsin @BamX and others
I am and to this library and pretty sure I must be doing wrong but not sure what? I am trying to read following shopping receipt
and my code for that looks like
and the output that I see is
I don't know what that means.
Can you please tell me how can I read the entire text from this image?
I have also uploaded the project if that would help you