claird / PyPDF4

A utility to read and write PDFs with Python
obsolete-https://pythonhosted.org/PyPDF2/
Other
328 stars 61 forks source link

PdfReadError: EOF marker not found #33

Closed xilopaint closed 5 months ago

xilopaint commented 5 years ago

I have a Python 2 project that uses a PyPDF4 version prior the commits that moved the output stream argument from write() to init() in PdfFileMerger. It works fine.

After I tried to update PyPDF4 in the project to the current state, making the corresponding changes on my code regarding the output stream, I got a “EOF Marker not found” error. Here’s the traceback:

Traceback (most recent call last):
  File "/Users/****/workflow/workflow.py", line 2067, in run
    func(self)
  File "alfred_pdf_tools.py", line 806, in main
    split_size(query, abs_path, suffix)
  File "alfred_pdf_tools.py", line 456, in split_size
    merger.append(inp_file, pages=(start, stop))
  File "/Users/****/pypdf/merger.py", line 208, in append
    self.merge(len(self._pages), fileobj, bookmark, pages, importBookmarks)
  File "/Users/****/pypdf/merger.py", line 137, in merge
    pdfr = PdfFileReader(fileobj, strict=self.strict)
  File "/Users/****/pypdf/pdf.py", line 1329, in __init__
    self._parsePdfFile(self._stream)
  File "/Users/****/pypdf/pdf.py", line 2117, in _parsePdfFile
    raise PdfReadError("EOF marker not found")
PdfReadError: EOF marker not found
kurtmckee commented 5 years ago

@xilopaint, first, note that I'm a newcomer to PyPDF4!

I'd like to confirm my interpretation of what you wrote:

  1. Your project has vendorized the PyPDF4 code.
  2. You somehow updated the PyPDF4 code. <-- clarification needed
  3. You updated your project's PyPDF4 calls to match the new function/init signatures.
  4. Traceback party.

Would you clarify step 2? Did you manually copy/paste changes, or did you completely overwrite the PyPDF4 files?

xilopaint commented 5 years ago

Would you clarify step 2? Did you manually copy/paste changes, or did you completely overwrite the PyPDF4 files?

Your interpretation is perfect and I did completely overwrite the PyPDF4 folder. You can see the attached "version" of PyPDF4 that works for me. pypdf.zip

BjornFJohansson commented 4 years ago

I use pdfrw to overcome this problem for pdfs that are malformed. I never managed to make PyPDF2 3 or 4 manage this.

import pdfrw 
import PyPDF4
pdf = "malformed.pdf"
x = pdfrw.PdfReader(pdf)
y = pdfrw.PdfWriter()
y.addpages(x.pages)
y.write(pdf)
pdf = PyPDF4.PdfFileReader(open(pdf, "rb"))