claird / PyPDF4

A utility to read and write PDFs with Python
obsolete-https://pythonhosted.org/PyPDF2/
Other
330 stars 61 forks source link

PdfFileWriter.write causes access to non-existent attribute in pdf.py #24

Open holdenweb opened 5 years ago

holdenweb commented 5 years ago

The attached code causes an exception when it executes the output.write(outfile) statement at line 58. The program appears to work with PyPDF2.

The zip file also includes a data file (you20.pdf) that errors, and one that doesn't (CleanedUOSSSimpleSabotage_sm.pdf) in case this helps track down the bug. Here's the traceback from a failed attempt:

[2018-12-15 10:57:32,725] ERROR in app: Exception on / [POST]
Traceback (most recent call last):
  File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/flask/app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/flask/app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/flask/app.py", line 1718, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
    raise value
  File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "app.py", line 58, in get_or_post
    output.write(outfile)
  File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 482, in write
    self._sweepIndirectReferences(externalReferenceMap, self._root)
  File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 572, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 548, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 572, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 548, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 557, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, data[i])
  File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 572, in _sweepIndirectReferences
    self._sweepIndirectReferences(externMap, realdata)
  File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 548, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 548, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 548, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 589, in _sweepIndirectReferences
    newobj = self._sweepIndirectReferences(externMap, newobj)
  File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 548, in _sweepIndirectReferences
    value = self._sweepIndirectReferences(externMap, value)
  File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 575, in _sweepIndirectReferences
    if data.pdf.stream.closed:
AttributeError: 'PdfFileWriter' object has no attribute 'stream'

Sorry about the zip file, but Github don't allow direct uploading of .py or .tar files. bug_report.zip

claird commented 5 years ago
  1. Yikes.
  2. Thanks.
  3. I'm backed up. It might be a few days before I look at this.

Cameron Laird, vice president We make computers work for people.

On Sat, Dec 15, 2018 at 4:08 AM Steve Holden notifications@github.com wrote:

The attached code causes an exception when it executes the output.write(outfile) statement at line 58. The program appears to work with PyPDF2.

The zip file also includes a data file (you20.pdf) that errors, and one that doesn't (CleanedUOSSSimpleSabotage_sm.pdf) in case this helps track down the bug. Here's the traceback from a failed attempt:

[2018-12-15 10:57:32,725] ERROR in app: Exception on / [POST] Traceback (most recent call last): File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/flask/app.py", line 2292, in wsgi_app response = self.full_dispatch_request() File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/flask/app.py", line 1815, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/flask/app.py", line 1718, in handle_user_exception reraise(exc_type, exc_value, tb) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise raise value File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/flask/app.py", line 1813, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/flask/app.py", line 1799, in dispatch_request return self.view_functionsrule.endpoint File "app.py", line 58, in get_or_post output.write(outfile) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 482, in write self._sweepIndirectReferences(externalReferenceMap, self._root) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 572, in _sweepIndirectReferences self._sweepIndirectReferences(externMap, realdata) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 548, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 572, in _sweepIndirectReferences self._sweepIndirectReferences(externMap, realdata) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 548, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 557, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, data[i]) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 572, in _sweepIndirectReferences self._sweepIndirectReferences(externMap, realdata) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 548, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 548, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 548, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 589, in _sweepIndirectReferences newobj = self._sweepIndirectReferences(externMap, newobj) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 548, in _sweepIndirectReferences value = self._sweepIndirectReferences(externMap, value) File "/usr/local/anaconda3/envs/general/lib/python3.6/site-packages/PyPDF4/pdf.py", line 575, in _sweepIndirectReferences if data.pdf.stream.closed: AttributeError: 'PdfFileWriter' object has no attribute 'stream'

Sorry about the zip file, but Github don't allow direct uploading of .py or .tar files. bug_report.zip https://github.com/claird/PyPDF4/files/2682558/bug_report.zip

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/claird/PyPDF4/issues/24, or mute the thread https://github.com/notifications/unsubscribe-auth/AAbN9E7aBecSaHbEcZM1OaokPhzFNiWCks5u5Ng2gaJpZM4ZUr80 .

acsor commented 5 years ago

Hi @holdenweb, thanks for the report. As you might know, PyPDF v4 is still undergoing a (slow) restructuring phase. Apparently I've been the only one to look forward its enhancement in the last months.

One of my primary concerns was to have the codebase look much cleaner and maintanable. We are familiar with these kinds of errors, it just takes time (that at the moment I lack) to solve them.

Meanwhile PyPDF2 might be a more stable, albeit obsolete, choice.

holdenweb commented 5 years ago

No worries - this is just a data point about a PyPDF program I was testing before submitting a PR. Perfectly happy to continue using PyPDF2.

xupengDu commented 4 years ago

I have the same problem and except someone to solve it with me

tsragland commented 3 years ago

Here's a workaround that worked for at least one use case. Maybe it will work for yours.

The problem seems to be that PdfFileWriter looks for a 'stream' attribute on the PdfFileWriter instance when performing some cleanup steps (_sweepIndirectReferences), and an error occurs because the PdfFileWriter class (as of 1.27.0) has no such attribute. However, that 'stream' attribute isn't referenced again in _sweepIndirectReferences.

A potentially viable workaround (until a fix for this issue is released) would be to create a wrapper class which extends PdfFileWriter with a 'stream' attribute, with its value set to an instance of BytesIO.

Use at your own risk. This is simply a workaround, which works in one case, but may or may not work for your case.

from io import BytesIO
from PyPDF4 import PdfFileWriter

class PdfFileWriterWithStreamAttribute(PdfFileWriter):
    def __init__(self):
        super().__init__()
        self.stream = BytesIO()
holdenweb commented 2 years ago

Hmm, sensible workaround. Seems like the bug classification is correct!