colemana / PyPDF2

A utility to read and write pdfs with Python. Superseded: see https://github.com/knowah/PyPDF2
Other
83 stars 19 forks source link

getNumPages() failed #2

Open HackAck opened 11 years ago

HackAck commented 11 years ago

The version:

pyPdf.pdf.version_info sys.version_info(major=2, minor=7, micro=3, releaselevel='final', serial=0)

The error: return pyPdf.PdfFileReader(file(fileName, "r")).getNumPages() File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 374, in init self.read(stream) File "/usr/lib/pymodules/python2.7/pyPdf/pdf.py", line 778, in read newTrailer = readObject(stream, self) File "/usr/lib/pymodules/python2.7/pyPdf/generic.py", line 67, in readObject return DictionaryObject.readFromStream(stream, pdf) File "/usr/lib/pymodules/python2.7/pyPdf/generic.py", line 522, in readFromStream value = readObject(stream, pdf) File "/usr/lib/pymodules/python2.7/pyPdf/generic.py", line 58, in readObject return ArrayObject.readFromStream(stream, pdf) File "/usr/lib/pymodules/python2.7/pyPdf/generic.py", line 153, in readFromStream arr.append(readObject(stream, pdf)) File "/usr/lib/pymodules/python2.7/pyPdf/generic.py", line 69, in readObject return readHexStringFromStream(stream) File "/usr/lib/pymodules/python2.7/pyPdf/generic.py", line 273, in readHexStringFromStream return createStringObject(txt) File "/usr/lib/pymodules/python2.7/pyPdf/generic.py", line 239, in createStringObject retval = TextStringObject(string.decode("utf-16")) File "/usr/lib/python2.7/encodings/utf_16.py", line 16, in decode return codecs.utf_16_decode(input, errors, True) UnicodeDecodeError: 'utf16' codec can't decode bytes in position 4-5: illegal encoding

knowah commented 11 years ago

version_info refers to the version of Python you are running (hence sys.version_info). Have you updated to PyPDF2? It seems that you are using the original pyPdf; that name is no longer used. Additionally, your file object (file(fileName, "r")) is not open in binary mode, which is likely the main cause of the problem (use "rb").