claird / PyPDF4

A utility to read and write PDFs with Python
obsolete-https://pythonhosted.org/PyPDF2/
Other
330 stars 61 forks source link

fix PDF writer bug when encountering free object #61

Closed askerlee closed 4 years ago

askerlee commented 5 years ago

When editing an existing PDF, occasionally the PDF contains a "free" IndirectObject that refers to nothing (correct me if I'm conceptually wrong). In this situation the reader will raise a PdfReadError:

            warnings.warn(
                "Object %d %d not defined." %
                (ref.idnum, ref.generation), PdfReadWarning
            )
            raise PdfReadError(
                "Could not find object (%d, %d)" % (ref.idnum, ref.generation)
            )

This exception will be passed to writer._sweepIndirectReferences(). However the except statement doesn't capture this exception, but only captures ValueError, which is not a base class of PdfReadError:

                     except ValueError:
                        # Unable to resolve the Object, returning NullObject
                        # instead.
                        warnings.warn(
                            "Unable to resolve [{}: {}], returning NullObject "
                            "instead".format(data.__class__.__name__, data)
                        )
                        return NullObject()

This will crash the script.

Hence this pull request updates this line to handle the PdfReadError exception. To be safer, it's modified to capture its base class, PyPdfError:

                     except (ValueError, PyPdfError):
                     ...