euske / pdfminer

Python PDF Parser (Not actively maintained). Check out pdfminer.six.
https://github.com/pdfminer/pdfminer.six
MIT License
5.25k stars 1.13k forks source link

Pdf2txt xml coordinates #74

Open MiguelCanto opened 10 years ago

MiguelCanto commented 10 years ago

I can't relate the pdf2txt xml coordinates with any of the unit of mesurement i tried. Pixels, centimeters, milimeters, points... Any of them has sense... Which unit is pdf2txt using?

Thank you very much.

euske commented 10 years ago

According to PDF 32000-1, Section 8.3.2.3 "User Space", the default unit of PDF (with no scaling, etc.) is point, which is 1/72 inch. A typical size of an A4 paper is defined as 612 points by 792 points which is 8.5 x 11 inch.