Closed khasburrahman closed 5 years ago
the results: the background is copied to all of the page, but the all the content seems to merged on each page
What do you mean by "but ~the~ all the content seems to [be] merged on each page"? (Sorry for the edits, it's just for clarity :-).)
By what I can see, the "content PDF" seems to be merged twice into the output, which is slightly noticeable for the text thickness.
@newnone Thanks for the edit, apologize for my lack of clarity 😄
I mean the generated result is something like this each page is having the background.PDF and page 1-4 of the content.PDF
What I intended to do is add the background behind each content page so the page 1 would be background.PDF and page 1 of the content.PDF page 2 would be background.PDF and page 2 of the content.PDF etc..
On a more careful inspection, I have noted that each of the four pages from content.pdf
were merged into a single page of result.pdf
.
I have edited the code and the problem appears like solved. Hopefully, this is a problem from the script at hand and not from the library:
from io import BytesIO
from os.path import abspath, dirname, join, pardir
from sys import path
SCRIPT_ROOT = dirname(__file__)
PROJECT_ROOT = abspath(join(SCRIPT_ROOT, pardir))
path.append(PROJECT_ROOT)
from pypdf4.pdf import PdfFileReader, PdfFileWriter
from pypdf4.merger import PdfFileMerger
contentFile = open(join(SCRIPT_ROOT, 'content.pdf'), 'rb')
backgroundFile = open(join(SCRIPT_ROOT, 'background.pdf'), 'rb')
contentReader = PdfFileReader(contentFile)
bgReader = PdfFileReader(backgroundFile)
writer = PdfFileWriter()
bg = bgReader.getPage(0)
for pagenum in range(contentReader.numPages):
page = contentReader.getPage(pagenum)
page.mergePage(bg)
writer.addPage(page)
output = open(join(SCRIPT_ROOT, 'result.pdf'), 'wb')
writer.write(output)
contentFile.close()
backgroundFile.close()
output.close()
This is result.pdf that is generated by the updated script.
The explanation for why this was happening is fairly simple. In the previous version of the code, in
for x in range (contentReader.getNumPages()):
bg = bgReader.getPage(0)
getPage(0)
plausibly returned a reference to the same object, although bg = bgReader.getPage(0)
was invoked on each iteration; then, the merge effects were accumulated at each stage. With the updated for-loop body, it is the page from content.pdf
that performs a merge (taking as an argument the only page from background.pdf
), which is distinct from all the others.
getPage(0)
plausibly returned a reference to the same object
Yes, if I do:
bg = bgReader.getPage(0)
for pagenum in range(contentReader.numPages):
# The is operator checks for identity and differs from ==
print(bgReader.getPage(0) is bg)
page = contentReader.getPage(pagenum)
...
the console prints:
$ python3 ./merge.py
True
True
True
True
That says it all.
@newnone Thanks for the example 👍
That will work fine if the background.PDF doesn't have any image overlapping the content I tried with different background2.pdf that has a block of image. the result would make the background blocking the content. that's why I merge like this
for x in range (contentReader.getNumPages()):
bg = bgReader.getPage(0)
but it didn't work like the earlier do you have any suggestion ?
You just need to have a copy of bg
such that on each iteration bg == bgReader.getPage(0) and bg is not bgReader.getPage(0)
. A common solution in other programming languages is to have a copy constructor.
PyPDF has nothing of that sort and we need to resort to other means. I do not consider this solution safe, but considering that PageObject
inherits from dict
we can leverage on dict.update()
. I cannot ensure this will work in future versions of PyPDF, but it did now:
from io import BytesIO
from os.path import abspath, dirname, join, pardir
from sys import path
SCRIPT_ROOT = dirname(__file__)
PROJECT_ROOT = abspath(join(SCRIPT_ROOT, pardir))
path.append(PROJECT_ROOT)
from pypdf4.pdf import PdfFileReader, PdfFileWriter, PageObject
from pypdf4.merger import PdfFileMerger
contentFile = open(join(SCRIPT_ROOT, 'content.pdf'), 'rb')
backgroundFile = open(join(SCRIPT_ROOT, 'background2.pdf'), 'rb')
contentReader = PdfFileReader(contentFile)
bgReader = PdfFileReader(backgroundFile)
writer = PdfFileWriter()
bgTemplate = bgReader.getPage(0)
for pagenum in range(contentReader.numPages):
bgCopy = PageObject(bgReader, bgTemplate.indirectRef)
bgCopy.update(bgTemplate)
# Replace with a call to assert or to "raise *Exception" if you go to production
print(bgCopy == bgTemplate and bgCopy is not bgTemplate)
page = contentReader.getPage(pagenum)
bgCopy.mergePage(page)
writer.addPage(bgCopy)
output = open(join(SCRIPT_ROOT, 'result.pdf'), 'wb')
writer.write(output)
contentFile.close()
backgroundFile.close()
output.close()
$ python3 ./Issue18/merge.py
True
True
True
True
with result.pdf as output.
😄 Thank you very much! I'll close this issue as this is solved!
Well done, feel free to follow myself if you like to stay up to date with PyPDF (or just want to exchange the small favor).
I recently tried to merge a background to a pdf There are 2 pdf file, the background and the content background.pdf content.pdf
I use this code to merge both files
the results: the background is copied to all of the page, but the all the content seems to merged on each page is there's something wrong in my code? result.pdf