What steps will reproduce the problem?
1. Pass the input file by 'stdin' and ask for hocr output.
What is the expected output? What do you see instead?
The file starts with:
<div class='ocr_page' id='page_1' title='image ""; bbox 0 0 2463 3565; ppageno 0'>
rather than the XML preamble. It seems that the Renderer's
BeginDocument/EndDocument invocations are missing in this case.
What version of the product are you using? On what operating system?
The latest SVN build on OS X.
Please provide any additional information below.
Based on quick perusal of the code, the issue is that BeginDocument is only
called on renderer on the code path that uses ProcessPages(), and requires
filename as input. However, when image is provided by stdin, the method being
called is ProcessPage(), and it is provided with image that has already been
read.
In addition to this issue, I'm seeing a very bizarre HTML document at least
sometimes when passing a filename on command line. It appears that the output
simply terminates entirely on the image name:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>
</title>
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name='ocr-system' content='tesseract 3.03' />
<meta name='ocr-capabilities' content='ocr_page ocr_carea ocr_par ocr_line ocrx_word'/>
</head>
<body>
<div class='ocr_page' id='page_1' title='image "??R8? </body>
</html>
I'm receiving this kind of output with a file called "R1VRYhtymä_Oy.tif". The
name is important, though it's probably about the size of the value more than
anything else. My guess is that the HOcrEscape() function returns reference to
memory that has already been free'd, since the string() method on STRING seems
to simply return the underlying pointer, and the instance goes out of scope at
the end of the function.
Original issue reported on code.google.com by alank...@bel.fi on 11 May 2014 at 10:56
Original issue reported on code.google.com by
alank...@bel.fi
on 11 May 2014 at 10:56