Closed GoogleCodeExporter closed 9 years ago
I have posted a patch which implements hOCR output support. See
http://code.google.com/p/tesseract-ocr/issues/detail?id=263
Original comment by amkryu...@gmail.com
on 22 Nov 2009 at 4:34
Fixed by patch in issue 263 and in 3.00.
Original comment by theraysm...@gmail.com
on 19 May 2010 at 11:10
I've made a quick and dirty patch to add djvused output to tesseract-ocr
v3.02.02. It still needs testing with CJK and right-to-left scripts and
multipage OCR, but it works for single pages in Russian. The patch adds a new
configuration option "tessedit_create_djvused". tesseract(1) man page is left
intact since I'm (sadly) not familiar with the syntax.
Original comment by ksa...@gmail.com
on 6 Aug 2013 at 10:06
Updated the patch with straight UTF-8 encoded output djvused happily accepts.
Original comment by ksa...@gmail.com
on 7 Aug 2013 at 3:59
Attachments:
Is such an option really needed? Why not use Jakub Wilk's hocr2djvused
distributed with ocrodjvu, or just ocrodjvu which since version 0.7.15 supports
also tesseract 3.02: http://jwilk.net/software/ocrodjvu.
Original comment by jsb...@mimuw.edu.pl
on 8 Aug 2013 at 4:19
hOCR and djvused are the most used OCR output formats nowadays beside plain
text, why no support both, especially if it's quite trivial?
I use Gentoo and Debian distributions, and ocrdjvu is not in the official
repositories, while tesseract is. Moreover, hocr2djvused encodes UTF-8
characters as escaped octals which makes non-English djvused it produces pretty
uneditable — even though djvused accepts UTF-8 as it is.
Original comment by ksa...@gmail.com
on 8 Aug 2013 at 7:39
A corection: ocrodjvu is in the official repositories of Debian, Ubuntu and
openSUSE (but unfortunately not always the latest version).
Original comment by jsb...@mimuw.edu.pl
on 8 Aug 2013 at 6:54
My bad, made a silly typo and didn't double-check. Missing from Gentoo repos
though (my primary distro), and the "uneditableness" issue still stand.
Original comment by ksa...@gmail.com
on 8 Aug 2013 at 8:21
Original issue reported on code.google.com by
jong...@gmail.com
on 15 Jul 2009 at 7:32