AmitGorvadiya / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

why is the output is junk.tr?? #349

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1.tesseract phototest.tif junk nobatch box.train
2.
3.

What is the expected output? What do you see instead?
phototest.tr

What version of the product are you using? On what operating system?
i use r355 modified the code according to issues 304. window7 professional 32bit

Please provide any additional information below.
i only found junk.tr in the folder, expected phototest.tr
i modify base.cpp and tesseractmain.ccp according to the source provided 
inhttp://code.google.com/p/tesseract-ocr/issues/attachmentText?id=304&aid=-61153
57738997299921&name=fix_issue304.diff&token=e7e8972239ca3512b04ea1d8fc583777. i 
there any step i missed ?
i had key in 
1.tesseract phototest.tif phototest batch.nochop makebox
2.change phototest.txt to phototest.box
3.unicharset_extractor phototest.box
4.tesseract phototest.tif junk nobatch box.train

Original issue reported on code.google.com by yushi...@yahoo.com on 17 Aug 2010 at 9:03

Attachments:

GoogleCodeExporter commented 9 years ago
because there is error on wiki. Correct command for tesseract 3.00 is:

tesseract phototest.tif phototest nobatch box.train.stderr

Original comment by zde...@gmail.com on 17 Aug 2010 at 9:41

GoogleCodeExporter commented 9 years ago
is there any other link that provide proper step on training tesseract-ocr 
besides wiki? . I'm new in this field , need more information to guide me. 
million thanks to people that help me . 

Original comment by yushi...@yahoo.com on 18 Aug 2010 at 3:25

GoogleCodeExporter commented 9 years ago
Issue 348 has been merged into this issue.

Original comment by joregan on 18 Aug 2010 at 9:59

GoogleCodeExporter commented 9 years ago

Original comment by joregan on 18 Aug 2010 at 10:01

GoogleCodeExporter commented 9 years ago
Issue 304 is still an open issue. In short, Tesseract does not work on Windows.

Original comment by joregan on 18 Aug 2010 at 10:02

GoogleCodeExporter commented 9 years ago
i don't know i success or not but i created lang.tranneddata. using r355 step 
above . For the wordlist part ..we had to create own txt file for number.txt , 
punc.txt, freq.txt, allword.txt to create lang.number-dawg? is it the training 
step in wiki only for 1 image each time ? 

Original comment by yushi...@yahoo.com on 18 Aug 2010 at 10:15

GoogleCodeExporter commented 9 years ago
Dictionaries are for language - you can do it only once (if you have a lot of  
words).

Original comment by zde...@gmail.com on 18 Aug 2010 at 5:20

GoogleCodeExporter commented 9 years ago
how i add data or train the XXX.trainneddata that i had create .Is it i had to 
manually key in every word in the image into those .txt files that i had 
mention above but it cant read various type of word either in bold,italic and 
ect from other image. erm...do you understand what i mean? sorry for broken 
English....

Original comment by yushi...@yahoo.com on 19 Aug 2010 at 8:22

GoogleCodeExporter commented 9 years ago
No, no. 

The training step on the wiki that uses box files - giving the coordinates of 
each character in an image - is only used to train the character recogniser.

The dictionaries are separate - you just need lists of words. 

Original comment by joregan on 19 Aug 2010 at 6:54

GoogleCodeExporter commented 9 years ago
oh...thanks for the guidance.

Original comment by yushi...@yahoo.com on 22 Aug 2010 at 6:58

GoogleCodeExporter commented 9 years ago

Original comment by zde...@gmail.com on 27 Sep 2010 at 9:06