dlareklami / tesseract-ocr

Automatically exported from code.google.com/p/tesseract-ocr
Other
0 stars 0 forks source link

hocr: single quotes not escaped in page title #1154

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
1. Create image file with single quote in name. For example 
"C:\temp\tesseract's_fail.png" 
2. Run tesseract for hocr output (tesseract.exe "C:\temp\tesseract's_fail.png" 
"C:\temp\hocr" hocr)
3. Result file hocr.html will be invalid. 

Single quote not escaped in title attribute in page div. For me it's 
<div class='ocr_page' id='page_1' title='image "C:\temp\tesseract's_fail.png"; 
bbox 0 0 275 297; ppageno 0'>

windows 7
tesseract-3.02

Original issue reported on code.google.com by irodio...@biarum.com on 8 May 2014 at 9:24

GoogleCodeExporter commented 9 years ago
Thanks - fixed in r1098

Original comment by zde...@gmail.com on 9 May 2014 at 10:19