Closed aborruso closed 7 years ago
Hi, when I run pdftohtml -c -hidden -xml a.pdf a.pdf.xml in this file I have no text boxes in the output, but only the below infos.
pdftohtml -c -hidden -xml a.pdf a.pdf.xml
Is it normal? What's wrong in my command?
Thank you
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd"> <pdf2xml producer="poppler" version="0.41.0"> <page number="1" position="absolute" top="0" left="0" height="892" width="1262"> <image top="0" left="0" width="1263" height="893" src="a.pdf-1_1.jpg"/> </page> <page number="2" position="absolute" top="0" left="0" height="892" width="1262"> <image top="0" left="0" width="1263" height="893" src="a.pdf-2_1.jpg"/> </page> <page number="3" position="absolute" top="0" left="0" height="892" width="1262"> <image top="0" left="0" width="1263" height="893" src="a.pdf-3_1.jpg"/> </page> <page number="4" position="absolute" top="0" left="0" height="892" width="1262"> <image top="0" left="0" width="1263" height="893" src="a.pdf-4_1.jpg"/> </page> <page number="5" position="absolute" top="0" left="0" height="892" width="1262"> <image top="0" left="0" width="1263" height="893" src="a.pdf-5_1.jpg"/> </page> </pdf2xml>
Ok I have read
The page has been scanned and processed with Optical Character Recognition (OCR)
My fault
Hi, when I run
pdftohtml -c -hidden -xml a.pdf a.pdf.xml
in this file I have no text boxes in the output, but only the below infos.Is it normal? What's wrong in my command?
Thank you