Open cadu-leite opened 3 years ago
Hi Cadu-Leite, Your PDF has some free objects (outlines,JS) that are referenced. I've introduced a fixed in my pre-released (https://github.com/pubpub-zz/PyPDF4/releases/tag/1.27.0ppZZ). Please note that this version has been deeply rewritten. I've normally kept backward compatibility. I've also starting to upgrade documemtation.Can you tell me if it is ok for you?
I Believe you change the namespace from PyPDF4 to pypdf ... it has to be in BIG LETTERS on docs.
continue...
Tha error has changed, but still on the same PDF file, a google sheet exported to PDF.
traceback - trying to red a PDF from Google Sheet.
Traceback (most recent call last):
File "/Users/cadu/projs/merge_pdfs/tests/test_merge2pdf.py", line 66, in test_merge_pdf_output
m.merge_pdfs()
File "/Users/cadu/projs/merge_pdfs/merge2pdf.py", line 89, in merge_pdfs
merged_pdf.append(fileobj = file_name)
File "/Users/cadu/.virtualenvs/merge_pdfs_pypdf4/lib/python3.8/site-packages/pypdf/merger.py", line 146, in append
self.merge(None, fileobj, bookmark, numpages, import_bookmarks)
File "/Users/cadu/.virtualenvs/merge_pdfs_pypdf4/lib/python3.8/site-packages/pypdf/merger.py", line 116, in merge
self._copy_bookmarks(fileobj.root_object["/Outlines"], bkmark, srcpages)
File "/Users/cadu/.virtualenvs/merge_pdfs_pypdf4/lib/python3.8/site-packages/pypdf/generic.py", line 430, in __getitem__
return dict.__getitem__(self, key).getObject()
File "/Users/cadu/.virtualenvs/merge_pdfs_pypdf4/lib/python3.8/site-packages/pypdf/generic.py", line 214, in getObject
return self.pdf.getObject(self).getObject()
File "/Users/cadu/.virtualenvs/merge_pdfs_pypdf4/lib/python3.8/site-packages/pypdf/pdfreader.py", line 488, in get_object
retval = self._get_object_by_ref(ref, self.R_XTABLE)
File "/Users/cadu/.virtualenvs/merge_pdfs_pypdf4/lib/python3.8/site-packages/pypdf/pdfreader.py", line 284, in _get_object_by_ref
raise PdfReadError("Cannot fetch a free object (id, next gen.) = (%d, %d)"
pypdf.utils.PdfReadError: Cannot fetch a free object (id, next gen.) = (2, 0)
.. then , eliminating the the Google Sheet PDF , taking it off the PDF list to be merged, I got another error.
Its seems you change the keyword parameters ... that not nice. It will break a lot of scripts , and you have a pythonic way to do that, you may accept both or dont change it at all.
Traceback (most recent call last):
File "/Users/cadu/projs/merge_pdfs/tests/test_merge2pdf.py", line 66, in test_merge_pdf_output
m.merge_pdfs()
File "/Users/cadu/projs/merge_pdfs/merge2pdf.py", line 87, in merge_pdfs
merged_pdf.append(fileobj = file_name, pages = page_range)
TypeError: append() got an unexpected keyword argument 'pages'
but ok, I changed the parameter name to numpages
its work .
But the problem with PDFs that comes from Google Sheet remains.
The rest seems to be ok .
please let know if a can help in anything else
Yes !
thanks for the test and report. I had a look:
About PyPDF4 renamed into pypdf, it is a choice from claird (don't know why)
First for the issue with google sheet PDF, I forgot to tell you to set strict to false in merger init in order to make merger tolerant to 'erroneous' file:
merged_pdf = PdfFileMerger(strict=False)
also I've found that the API broken you've raised : I've fixed it
finally I've found an issue when a NullObject is returned for outlines. fixed also.
I've run successfully your test
find the update of my library. changes have been committed but I would like for a few for beta tester before tagging it.
pypdf4-1.27.0PPzz_1-py2.py3-none-any.whl.zip
Thanks for your returns.
The PDF file is attached pdf_sample_googlesheet_pages_02.pdf
traceback:
ValueError: invalid literal for int() with base 10: b'F-1.4'
A code with test incuded can be seen at this repo (merge2pdf)