baopham1340 / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

Devanagari - psm 3 - table of contents - page number coming after all titles #1338

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. run tesseract with -psm 3 on attached image with devanagari traineddata
2.
3.

What is the expected output? What do you see instead?
I expect page numbers to come along with the text for each line of table of 
contents. instead all page numbers are listed after all text - treated as 
wrapping column..

What version of the product are you using? On what operating system?
latest version from git, on windows 8 under msys2

Please provide any additional information below.
tif file and recognized output with psm 3, 4 and 6 attached. Problem in -psm 3 
which is  Fully automatic page segmentation 

Original issue reported on code.google.com by shreeshrii on 12 Oct 2014 at 3:25

Attachments: