Closed necros2k7 closed 1 year ago
You can skip decompress, compress, dedup and clean, I think, since squeeze does all of that.
Are you able to provide an example file I can look at? I'm not familiar with pdfsizeopt, but I can look into it.
Actually no, I tested this chain and it squeezed few more bytes. Will upload sample later. https://github.com/pts/pdfsizeopt
sample: tst.pdf.gz
info: eliminated 2 unused objs in 2 classes info: compressed 3 streams, kept 3 of them uncompressed info: saving PDF with 13 objs to: tst2.pdf info: generated object stream of 789 bytes in 9 objects (25%) info: generated 115232 bytes (67%)
On this file, since it only uses Standard 14 fonts (i.e Times New Roman), you can use -remove-fonts to get it down to about 70k. The ISO standardisation people will get grumpy, but the reality is that every PDF viewer will always have the 14 standard fonts built in.
The rest of the file is then just the image. Our squeezer, being non-lossy, won't touch that.
How to know if pdf have Standard 14 fonts? Considering images they can be losslessly reduced further - like strip meta from them , recompress pngs, jpgs. Option during squeezing to strip embedded files would be also very useful. Same goes to Standard fonts stripping. Also in what place of commands above should we put -remove-fonts optimally? Also does just -remove-font w/o -squeeze command is lossless to other objects (pix)?
You can use cpdf -list-fonts
. You would have to build a list of font names which correspond to the 14 standard fonts, and remember to strip subsetting prefixes.
-remove-fonts
just removes the actual font file from the PDF, leaving the PDF font metadata. It should be used before -squeeze, at any point in the order.
At some point in the future, cpdf will gain the ability to process images through external processes, but it doesn't have it yet. I made a feature request here: https://github.com/johnwhitington/cpdf-source/issues/244
Can we extract graphics from pdf losslessly - optimize it and then reinsert to source pdf again without text data loss?
No, that sort of round-tripping is what I suggest in https://github.com/johnwhitington/cpdf-source/issues/244
Am I right in my flags to get smallest files losslessly?
decompress>compress>removeID>removeMeta>dedup>clean>squeeze?
Even after that my output file was further reduced in size by Pdfsizeopt dead utility (using GS?): "info: eliminated 2 unused objs in 2 classes info: compressed 127 streams, kept 127 of them uncompressed"