euske / pdfminer

Python PDF Parser (Not actively maintained). Check out pdfminer.six.
https://github.com/pdfminer/pdfminer.six
MIT License
5.26k stars 1.13k forks source link

Extract image TypeError: object of type 'zip' has no len() #278

Open Honghe opened 4 years ago

Honghe commented 4 years ago

Env:

Command:

 pdf2txt.py -d -o Wang\ et\ al_2017_Tacotron.txt -t text -O images Wang\ et\ al_2017_Tacotron.pdf

log:

Traceback (most recent call last):
  File "/home/jack/anaconda3/envs/py36/bin/pdf2txt.py", line 115, in <module>
    if __name__ == '__main__': sys.exit(main(sys.argv))
  File "/home/jack/anaconda3/envs/py36/bin/pdf2txt.py", line 110, in main
    interpreter.process_page(page)
  File "/home/jack/anaconda3/envs/py36/lib/python3.6/site-packages/pdfminer/pdfinterp.py", line 842, in process_page
    self.device.end_page(page)
  File "/home/jack/anaconda3/envs/py36/lib/python3.6/site-packages/pdfminer/converter.py", line 50, in end_page
    self.receive_layout(self.cur_item)
  File "/home/jack/anaconda3/envs/py36/lib/python3.6/site-packages/pdfminer/converter.py", line 182, in receive_layout
    render(ltpage)
  File "/home/jack/anaconda3/envs/py36/lib/python3.6/site-packages/pdfminer/converter.py", line 172, in render
    render(child)
  File "/home/jack/anaconda3/envs/py36/lib/python3.6/site-packages/pdfminer/converter.py", line 172, in render
    render(child)
  File "/home/jack/anaconda3/envs/py36/lib/python3.6/site-packages/pdfminer/converter.py", line 172, in render
    render(child)
  File "/home/jack/anaconda3/envs/py36/lib/python3.6/site-packages/pdfminer/converter.py", line 179, in render
    self.imagewriter.export_image(item)
  File "/home/jack/anaconda3/envs/py36/lib/python3.6/site-packages/pdfminer/image.py", line 74, in export_image
    if len(filters) == 1 and filters[0][0] in LITERALS_DCT_DECODE:
TypeError: object of type 'zip' has no len()

Test file:

Wang_et_al_2017_Tacotron.zip

lpnaunau commented 3 years ago

Hi !

Maybe a bit late but I found a solution to this. (It may help someone in the future) I modified the following file: site-packages\pdfminer\pdftypes More precisely the function "get_filters" and what it returns. See code below:

    def get_filters(self):
        filters = self.get_any(('F', 'Filter'))
        params = self.get_any(('DP', 'DecodeParms', 'FDecodeParms'), {})
        if not filters:
            return []
        if not isinstance(filters, list):
            filters = [filters]
        if not isinstance(params, list):
            # Make sure the parameters list is the same as filters.
            params = [params]*len(filters)
        if STRICT and len(params) != len(filters):
            raise PDFException("Parameters len filter mismatch")
        return list(zip(filters, params))

Best regards,

moran-trullion commented 2 years ago

lpnaunau

Thanks

amscosta commented 9 months ago

Great ! Many thanks! . Just solved my issue also.