euske / pdfminer

Python PDF Parser (Not actively maintained). Check out pdfminer.six.
https://github.com/pdfminer/pdfminer.six
MIT License
5.25k stars 1.13k forks source link

TypeError: object of type 'zip' has no len() #264

Open clach04 opened 5 years ago

clach04 commented 5 years ago

Error extracting images using:

py -3 pdf2txt.py -o test.html -O test_images FightingGamePrimer.pdf

get error:

Traceback (most recent call last):
  File "pdf2txt.py", line 115, in <module>
    if __name__ == '__main__': sys.exit(main(sys.argv))
  File "pdf2txt.py", line 110, in main
    interpreter.process_page(page)
  File "C:\code\py\pdf\pdfminer\pdfminer\pdfinterp.py", line 842, in process_page
    self.device.end_page(page)
  File "C:\code\py\pdf\pdfminer\pdfminer\converter.py", line 50, in end_page
    self.receive_layout(self.cur_item)
  File "C:\code\py\pdf\pdfminer\pdfminer\converter.py", line 387, in receive_layout
    render(ltpage)
  File "C:\code\py\pdf\pdfminer\pdfminer\converter.py", line 343, in render
    render(child)
  File "C:\code\py\pdf\pdfminer\pdfminer\converter.py", line 352, in render
    render(child)
  File "C:\code\py\pdf\pdfminer\pdfminer\converter.py", line 355, in render
    self.place_image(item, 1, item.x0, item.y1, item.width, item.height)
  File "C:\code\py\pdf\pdfminer\pdfminer\converter.py", line 277, in place_image
    name = self.imagewriter.export_image(item)
  File "C:\code\py\pdf\pdfminer\pdfminer\image.py", line 74, in export_image
    if len(filters) == 1 and filters[0][0] in LITERALS_DCT_DECODE:
TypeError: object of type 'zip' has no len()

Attached sample (CC licensed) demo file FightingGamePrimer.pdf but I suspect any pdf with images will have issues.

clach04 commented 5 years ago

Reproduced with sample provided with pdfminer:

py -3 pdf2txt.py -o test.html -O test_images samples/nonfree/i1040nr.pdf

Also found https://github.com/pdfminer/pdfminer.six/issues/15 along with a fix that works :)

clach04 commented 4 years ago

pr #265