claird / PyPDF4

A utility to read and write PDFs with Python
obsolete-https://pythonhosted.org/PyPDF2/
Other
330 stars 61 forks source link

TypeError during call to getDocumentInfo() #92

Open jusher00 opened 3 years ago

jusher00 commented 3 years ago

This bug was already reported in PyPDF2 (here) and I was just able to reproduce the same behavior with PyPDF4. The corresponding lines of code are here and here.

My code to reproduce this:

import PyPDF4

pdf = PdfFileReader(open(file, 'rb'))
info = pdf.getDocumentInfo()

which throws the following error:

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "D:\venv\lib\site-packages\PyPDF4\pdf.py", line 1165, in getDocumentInfo
    obj = self.trailer['/Info']
  File "D:\venv\lib\site-packages\PyPDF4\generic.py", line 518, in __getitem__
    return dict.__getitem__(self, key).getObject()
  File "D:\venv\lib\site-packages\PyPDF4\generic.py", line 179, in getObject
    return self.pdf.getObject(self).getObject()
  File "D:\venv\lib\site-packages\PyPDF4\pdf.py", line 1676, in getObject
    retval = readObject(self.stream, self)
  File "D:\venv\lib\site-packages\PyPDF4\generic.py", line 66, in readObject
    return DictionaryObject.readFromStream(stream, pdf)
  File "D:\venv\lib\site-packages\PyPDF4\generic.py", line 582, in readFromStream
    if not data.get(key):
TypeError: unhashable type: 'ArrayObject'

Unfortunately, I can't share the PDF, because it contains sensitive data.