Closed sant527 closed 4 years ago
I found that in some pdfs its not cropping at all with -gs, but with some its perfect. Because pdftoppm uses lot of space in tmp for long documents. this is really good
I am not able to crop this file ADI_41_43.pdf using -gs. Its cropped file is ADI_41_43_cropped.pdf
$ pdf-crop-margins -v -gs -p4 100 0 100 100 ADI_41_43.pdf
Processing the PDF with pdfCropMargins (version 0.2.6)...
System type: Linux
The input document's filename is:
ADI_41_43.pdf
Using the default-generated output filename.
The output document's filename will be:
ADI_41_43_cropped.pdf
The absolute pre-crops to be applied to each margin, in units of bp, are:
[0.0, 0.0, 0.0, 0.0]
The percentages of margins to retain are:
[100.0, 0.0, 100.0, 100.0]
The absolute offsets to be applied to each margin, in units of bp, are:
[0.0, 0.0, 0.0, 0.0]
The uniform order statistics to apply to each margin, in units of bp, are:
[]
For the full page size, using values from the PDF box
specified by the intersection of these boxes: ['c']
Found Ghostscript program at: gs
The input document has 3 pages.
The document's metadata, if set:
The Author attribute set in the input document is:
None
The Creator attribute set in the input document is:
None
The Producer attribute set in the input document is:
PyPDF2
The Subject attribute set in the input document is:
None
The Title attribute set in the input document is:
None
All the pages of the document will be cropped.
Original full page sizes, in PDF format (lbrt):
1 rot = 0 RectangleObject([0, 0, 432, 1080])
2 rot = 0 RectangleObject([0, 0, 432, 1080])
3 rot = 0 RectangleObject([0, 0, 432, 1080])
Copied these items from the document catalog:
/Type
Skipped copy of these items from the document catalog:
/Pages
The document was not previously cropped by pdfCropMargins.
Writing out the PDF with the CropBox and MediaBox redefined.
Using Ghostscript to calculate the bounding boxes.
The bounding boxes are:
1 [0.0, 0.96, 432.000016, 1072.800041]
2 [0.0, 0.96, 432.000016, 1072.800041]
3 [0.0, 0.96, 432.000016, 1072.800041]
New full page sizes after cropping, in PDF format (lbrt):
1 RectangleObject([0, 0.96, 432, 1080])
2 RectangleObject([0, 0.96, 432, 1080])
3 RectangleObject([0, 0.96, 432, 1080])
Writing the cropped PDF file.
Finished this run of pdfCropMargins.
Whereas i have another pdf file MAD2_28_31.pdf
which gets cropped as per the command
$ pdf-crop-margins -v -gs -p4 100 0 100 100 MAD2_28_31.pdf
Processing the PDF with pdfCropMargins (version 0.2.6)...
System type: Linux
The input document's filename is:
MAD2_28_31.pdf
Using the default-generated output filename.
The output document's filename will be:
MAD2_28_31_cropped.pdf
The absolute pre-crops to be applied to each margin, in units of bp, are:
[0.0, 0.0, 0.0, 0.0]
The percentages of margins to retain are:
[100.0, 0.0, 100.0, 100.0]
The absolute offsets to be applied to each margin, in units of bp, are:
[0.0, 0.0, 0.0, 0.0]
The uniform order statistics to apply to each margin, in units of bp, are:
[]
For the full page size, using values from the PDF box
specified by the intersection of these boxes: ['c']
Found Ghostscript program at: gs
The input document has 4 pages.
The document's metadata, if set:
The Author attribute set in the input document is:
None
The Creator attribute set in the input document is:
None
The Producer attribute set in the input document is:
PyPDF2
The Subject attribute set in the input document is:
None
The Title attribute set in the input document is:
None
All the pages of the document will be cropped.
Original full page sizes, in PDF format (lbrt):
1 rot = 0 RectangleObject([0, 0, 432, 1584])
2 rot = 0 RectangleObject([0, 0, 432, 1584])
3 rot = 0 RectangleObject([0, 0, 432, 1584])
4 rot = 0 RectangleObject([0, 0, 432, 1584])
Copied these items from the document catalog:
/Type
Skipped copy of these items from the document catalog:
/Pages
The document was not previously cropped by pdfCropMargins.
Writing out the PDF with the CropBox and MediaBox redefined.
Using Ghostscript to calculate the bounding boxes.
The bounding boxes are:
1 [0.0, 1393.920053, 432.000016, 1576.80006]
2 [0.0, 1375.200052, 432.000016, 1576.80006]
3 [0.0, 1375.200052, 432.000016, 1576.80006]
4 [0.0, 1375.200052, 432.000016, 1576.80006]
New full page sizes after cropping, in PDF format (lbrt):
1 RectangleObject([0, 1393.92005, 432, 1584])
2 RectangleObject([0, 1375.20005, 432, 1584])
3 RectangleObject([0, 1375.20005, 432, 1584])
4 RectangleObject([0, 1375.20005, 432, 1584])
Writing the cropped PDF file.
Finished this run of pdfCropMargins.
The original file MAD2_28_31.pdf
The cropped file MAD2_28_31_cropped.pdf
Why its cropping one and not the other. Both the files are made in the same way using word document to pdf.
The default is to use pdftoppm to render the pages to .ppm
files and then compute the crops from those images. The -gsr
option works just the same way, except that it uses Ghostscript to render the document to .ppm
files rather than using pdftoppm. The --gsBbox
option is equivalent to the -gs
option and does not directly render to .ppm
files at all. It calls Ghostscript to compute the bounding boxes directly and return the results (and does not work on scanned documents).
I'm not sure why some files from the same source would work with -gs
and some would not. I'll look into it.
Thank you. Since mine is not a scanned document, i prefer to use -gs
it requires less space in tmp and also time. Kindly have a look at the files
It's difficult to determine exactly what's happening, since with -gs
Ghostscript is essentially being used as a black box to compute the bounding boxes. I don't know the internals of its algorithm. Ghostscript is apparently detecting some kind of PDF object near the bottoms of pages in the documents that aren't cropping correctly with -gs
. This object isn't affecting the rendered image versions, though. I noticed that when I do a pre-crop of 6bp on the bottom of the document it crops as expected: pdf-crop-margins -v -gs -p4 100 0 100 100 -ap4 0 6 0 0 ADI_41_43.pdf
.
You're also using a fairly old version of pdfCropMargins, but that doesn't seem to be causing this issue.
-ap4 0 6 0 0 option worked. (pre cropping a bit before). But hope it will not crop if text is there within 6
I want to know will -gs (using ghostscript for bounding box) crop the same as without this option, if my document is not scanned but a word text (no images) converted to pdf.