Open Srilakshman213 opened 1 year ago
In addition to the non-visual elements mentioned earlier, there are some other techniques you can use with Apache PDFBox to reduce the size of your PDF file without changing it visually:
removeUnusedObjects()
method:PDDocument document = PDDocument.load(new File("input.pdf"));
document.removeUnusedObjects();
document.save(new File("output.pdf"));
document.close();
LosslessImageOptimizer
or JpegFactory
classes:PDDocument document = PDDocument.load(new File("input.pdf"));
LosslessImageOptimizer optimizer = new LosslessImageOptimizer();
optimizer.optimize(document);
document.save(new File("output.pdf"));
document.close();
or
PDDocument document = PDDocument.load(new File("input.pdf"));
for (PDPage page : document.getPages()) {
for (PDImageXObject image : page.getResources().getXObjectNames().stream().map(name -> {
try {
return page.getResources().getXObject(name);
} catch (IOException e) {
return null;
}
}).filter(obj -> obj instanceof PDImageXObject).map(obj -> (PDImageXObject) obj).collect(Collectors.toList())) {
ByteArrayOutputStream output = new ByteArrayOutputStream();
BufferedImage bufferedImage = image.getImage();
ImageIO.write(bufferedImage, "jpeg", output);
image.getCOSObject().setItem(COSName.FILTER, COSName.DCT_DECODE);
image.getCOSObject().setItem(COSName.SUBTYPE, COSName.IMAGE);
image.getCOSObject().setInt(COSName.BITS_PER_COMPONENT, 8);
image.getCOSObject().setItem(COSName.COLORSPACE, COSName.DEVICERGB);
image.getCOSObject().setInt(COSName.WIDTH, bufferedImage.getWidth());
image.getCOSObject().setInt(COSName.HEIGHT, bufferedImage.getHeight());
image.getCOSObject().setItem(COSName.LENGTH, new COSInteger(output.size()));
image.setData(output.toByteArray());
}
}
document.save(new File("output.pdf"));
document.close();
removeEmbeddedFiles()
method:PDDocument document = PDDocument.load(new File("input.pdf"));
document.getDocumentCatalog().getMetadata().removeItem(COSName.NAMES);
document.removeNamedDestination("EmbeddedFiles");
document.removeEmbeddedFiles();
document.save(new File("output.pdf"));
document.close();
These techniques can help you reduce the size of your PDF file without changing it visually. You can use them individually or in combination depending on your requirements.
Yes, here are some additional techniques you can use with Apache PDFBox to reduce the size of your PDF file:
removeRedundantStreams()
method:PDDocument document = PDDocument.load(new File("input.pdf"));
document.getDocumentCatalog().getPages().forEach(page -> page.getResources().getFontNames()
.forEach(fontName -> page.getResources().removeFont(fontName)));
document.getDocumentCatalog().getPages().forEach(PDPage::removeAnnotations);
document.removeRedundantStreams();
document.save(new File("output.pdf"));
document.close();
removeFonts()
method:PDDocument document = PDDocument.load(new File("input.pdf"));
document.getDocumentCatalog().getPages().forEach(page -> page.getResources().getFontNames()
.forEach(fontName -> page.getResources().removeFont(fontName)));
document.removeFonts();
document.save(new File("output.pdf"));
document.close();
setDocumentOutline(null)
method:PDDocument document = PDDocument.load(new File("input.pdf"));
document.getDocumentCatalog().setDocumentOutline(null);
document.save(new File("output.pdf"));
document.close();
Compress
class:PDDocument document = PDDocument.load(new File("input.pdf"));
Compress compress = new Compress();
compress.setCompressStreams(true);
compress.setRemoveBlankPages(true);
compress.setEndPage(document.getNumberOfPages());
compress.compress(document);
document.save(new File("output.pdf"));
document.close();
flatten()
method:PDDocument document = PDDocument.load(new File("input.pdf"));
document.getDocumentCatalog().getAcroForm().flatten();
document.save(new File("output.pdf"));
document.close();
These techniques can help you further reduce the size of your PDF file. You can use them in combination with the techniques mentioned earlier to achieve the desired file size reduction.
PDFBox is a popular Java library for working with PDF files. You can use PDFBox to remove non-visual elements from a PDF file. Here are some examples of how to remove some of the non-visual elements using PDFBox:
These examples demonstrate how to remove some of the non-visual elements using PDFBox. You can modify them as per your requirements and remove other non-visual elements as well.