eppye-bots / bots

Automatically exported from code.google.com/p/bots
66 stars 126 forks source link

preprocess.py extractpdf does not work with current version of pdfminer #370

Closed BikeMikeAU closed 4 years ago

BikeMikeAU commented 8 years ago

The API of pdfminer has changed in newer versions, requiring a small change in preprocess.py.

  1. change the imports
  2. remove process_pdf, replace with new code (3 lines)

https://github.com/euske/pdfminer#api-changes

diff of preprocess.py

@@ -323 +323,2 @@
-    from pdfminer.pdfinterp import PDFResourceManager, process_pdf
+    from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
+    from pdfminer.pdfpage import PDFPage
@@ -402 +403,3 @@
-        process_pdf(rsrcmgr, device, pdf_stream, pagenos=set(), password=password, caching=True, check_extractable=True)
+        interpreter = PDFPageInterpreter(rsrcmgr, device)
+        for page in PDFPage.get_pages(pdf_stream, password=password, caching=True, check_extractable=True):
+            interpreter.process_page(page)