frescobaldi / python-poppler-qt5

Python binding to libpoppler-qt5
Other
42 stars 32 forks source link

Return all the text on a page not working with `page.text()` #35

Closed joelostblom closed 4 years ago

joelostblom commented 4 years ago

Thanks for maintaining this library! I have been looking for a way to extract text and annotations from PDFs and this might be it!

I have run into a problem when trying to extract the text from a page. According to the QtPoppler manual page.text() should return the entire page's text when invoked without arguments (at least I believe that is what "If rect is null, all text on the page is given" means). However, when I try this I get an error:

import popplerqt5

doc = popplerqt5.Poppler.Document.load('./test1.pdf')
page = doc.page(0)
page.text()

Out:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Page.text(): arguments did not match any overloaded call:
  overload 1: not enough arguments
  overload 2: not enough arguments

This is the test file test1.pdf. How can I return all the text on a page?

joelostblom commented 4 years ago

Apparently, and empty QRectF if what the docs refer to with "if rect is null". The following returns all the text on a page:

import popplerqt5
from PyQt5 import QtCore

doc = popplerqt5.Poppler.Document.load('./test1.pdf')
page = doc.page(0)
print(page.text(QtCore.QRectF()))