Open rien333 opened 5 years ago
I second this!
It would be great to be able to preserve tables of contents and links within the PDF (it's been on my personal wishlist for a long time). Unfortunately, I don't think this is possible using the PDF library that krop currently uses. If anyone has an idea how to possibly implement this, I would love to learn about it!
Unfortunately, I don't think this is possible using the PDF library that krop currently uses
That would be poppler, right? Also, I still really enjoy using krop. Simple, but well executed.
Thank you for the kind words! I'm glad you find krop useful despite this shortcoming. Poppler is used for displaying the PDF, but the cropping is done using PyPDF2.
After looking a bit through PyPDF2, I was able to preserve links within a PDF with this change:
diff --git a/krop/mainwindow.py b/krop/mainwindow.py
index fd1ae32..e8adadf 100644
--- a/krop/mainwindow.py
+++ b/krop/mainwindow.py
@@ -413,6 +413,7 @@ class MainWindow(QKMainWindow):
pdf = PdfFile()
pdf.loadFromFile(inputFileName)
cropper = PdfCropper()
+ cropper.copyDocumentRoot(pdf)
for nr in pages:
c = self.viewer.cropValues(nr)
cropper.addPageCropped(pdf, nr, c, alwaysinclude, rotation)
diff --git a/krop/pdfcropper.py b/krop/pdfcropper.py
index 679c6fc..21a0df1 100644
--- a/krop/pdfcropper.py
+++ b/krop/pdfcropper.py
@@ -56,6 +56,9 @@ class AbstractPdfCropper:
def addPageCropped(self, pdffile, pagenumber, croplist, rotate=0):
pass
+ def copyDocumentRoot(self, pdffile):
+ pass
+
class PyPdfFile(AbstractPdfFile):
"""Implementation of PdfFile using pyPdf"""
@@ -110,6 +113,15 @@ class PyPdfCropper(AbstractPdfCropper):
if rotate != 0:
page.rotateClockwise(rotate)
+ def copyDocumentRoot(self, pdffile):
+ # Sounds promising in PyPDF2 (see PdfFileWriter.cloneDocumentFromReader),
+ # but doesn't seem to produce a readable PDF:
+ # self.output.cloneReaderDocumentRoot(pdffile.reader)
+ # Instead, this copies at least the named destinations for links:
+ for dest in pdffile.reader.namedDestinations.values():
+ self.output.addNamedDestinationObject(dest)
+
+
def optimizePdfGhostscript(oldfilename, newfilename):
import subprocess
subprocess.check_call(('gs', '-sDEVICE=pdfwrite', '-sOutputFile=' + newfilename,
It seems PyPDF2 has a special method to copy all such metadata at once named cloneReaderDocumentRoot
, but that gave me a document with a lot of empty pages and only the links. So copying just the named destinations for links was the best I could come up with for now.
If you would like to experiment further, I suggest a python debugger or using ptpython
interactively in a script like this:
#!/usr/bin/env python3
from PyPDF2 import PdfFileReader
from ptpython.repl import embed
if __name__=="__main__":
with open("test.pdf", "rb") as infile:
pdf = PdfFileReader(infile)
embed(globals(), locals())
This prepares a reader and then drops you to an interactive REPL with useful autocompletion.
@arminstraub the previous comment seems to solve this issue. Am I wrong?
@chrthi Thank you so much for offering this solution to preserving links! It didn't work on all the PDFs that I tested it with, but it was definitely better than nothing. It should be part of the next release of krop.
Once I have some time (hard these days...), I am planning to add support for pikepdf to krop which hopefully will make these sorts of things easier to work with.
After looking a bit through PyPDF2, I was able to preserve links within a PDF with this change:
diff --git a/krop/mainwindow.py b/krop/mainwindow.py ...
Just tried this, but unfortunately doesn't work! The links are not preserved.
What a great little app!
However, after processing a pdf with
krop
, the table of contents "metadata" seems to be deleted. Is there any way to retain it? (makes sense for ereaders too, navigating to a specific part is especially cumbersome on slower and smaller devices if you can't just select a chapter)