Attached is another very trivial patch that the project may find useful.
We have found that during post-processing of tesseract output text, it can be
very helpful to have the form feed (page break) control character present at
the end of a page.
This patch adds a configuration parameter called "include_formfeed_pagebreaks"
which enables this behavior (for TessTextRenderer only... seemed like hOCR and
box already contained page number metadata, and I don't know what UNLV text
is.).
I'm also including a sample tiff image and the output with the parameter
disabled (the default behavior) and enabled.
Discussion:
https://groups.google.com/d/msg/tesseract-dev/VsgJ9R-cTQ0/OMeDjYWoAdQJ
Original issue reported on code.google.com by zde...@gmail.com on 30 Jan 2015 at 9:37
Original issue reported on code.google.com by
zde...@gmail.com
on 30 Jan 2015 at 9:37Attachments: