claird / PyPDF4

A utility to read and write PDFs with Python
obsolete-https://pythonhosted.org/PyPDF2/
Other
330 stars 61 forks source link

Issue during split PDF with python3 #54

Open yanqingwang opened 5 years ago

yanqingwang commented 5 years ago

Dear Expert,

I executed the demo code, and tried to split a pdf with 43 pages, and after 1 or pages, some error happens: Exception: C://temp/pdf/test_split_page_f2.pdf 'PdfFileWriter' object has no attribute 'stream' <_io.BufferedWriter name='C://temp/pdf/test_split_page_f2.pdf'>

I guess the pdf_writer can't be created several times. I also tried to move the pdf_writer = PyPDF4.PdfFileWriter() the code out of for loop, and then it works to generated 43 files, but can't meet my requirements. the 1st page has 1 page, 2nd file has 2 page ....

The screenshot is the results with the exception, the size of some files is 0.

Ross

def split(path, name_of_split): pdf = PyPDF4.PdfFileReader(path) for page in range(pdf.getNumPages()):

    try:
        pdf_writer = PyPDF4.PdfFileWriter()
        pdf_writer.addPage(pdf.getPage(page))
        output = name_of_split + '_f'+str(page)+'.pdf'
        with open(output, 'wb') as output_pdf:
            pdf_writer.write(output_pdf)
    except Exception as e:
        print('Exception:', output,e,output_pdf)

image

ansi0 commented 5 years ago

Hello,

I have had the same issue with this when attempting to split a PDF into multiple subsections.. I have found 2 ways to handle this (both are bad, mkay?):

  1. (bad) Loop to repeatedly read the source file, adding the desired pages to the PDFWriter object and and then writing. This is terrible since for every single split you're re-reading the entire source.
  2. (possibly catastrophic) While this is by no means an actual solution to the problem (and I may be causing additional issues for some more complex PDF files maybe?), I found that the error was ultimately caused by line 575 in pdf.py:

if data.pdf.stream.closed:

I was finally able to get this going by modifying the line to:

if hasattr(data.pdf, 'stream') and data.pdf.stream.closed:

Hope this helps some people until someone picks up maintenance again :)

Edit: Spelling, grammar and phrasing.

mgamble commented 3 years ago

I just ran into this problem as well. I would think the need to split a pdf into to more than one file is a major use case. I hope this gets fixed officially. Other than this issue, this is such a great library.

I can confirm that ansi0's second fix seems to work.

I think the problem may stem from how PdfFileWriter.init and .write functions have been changing. In this master branch (with version listed at 1.27.0) the arguments are different from what gets downloaded when installing with pip even though that is also listed as 1.27.0. Here init takes a stream argument and .write does not. In the pip installed version it is opposite. So I imagine the PdfFileWriter may have previously had a close attribute which is now gone rendering that if statement irrelevant? Not sure.