Error on lossless compression

ken-huston commented 4 years ago

Hi,

With lossy compression I get fantastic results (more than 10 times reduction in size in a pdf made from .jpg images). Although reduction in quality is bearable, as I get so much reduction I wanted to try the lossless compression to compare the results.

From other issues I read that this is done without the -s option, but if I do that, I get this error:

Processing "pages-000.jpg"...
source image: 708 x 1121 (8 bits) 0dpi x 0dpi, refcount = 1
thresholded image: 708 x 1121 (1 bits) 0dpi x 0dpi, refcount = 1
0�a&��a��������j��QD�ŭd�Z,��q�f4i�dDY�4^ȟ!�X�؂�ub0~�~���5����k�5�Q �dK�'�4�m.��;�g�hm���F��m�&
             �*(���:S�Pq�M�����L,�#�ex�D.�/��u�
                                               \�}*�YvCBO��
                                                           �P��n�
                                                                 �p��ăUAuDZ�TLX&�:p���'�V4w�j%z�hu+��S�~�-�@iȅ
                              v���ye���1_L�����X+���]�Ȓ�$�����-^��g�!!pB����L��A��$�^��]��2^S K�q�4��A�d
                        �:��D����v}��ZY%_���,(-�EӶ��M̸��1�F�`��cV�Ț�=+h�:k��kM�
                                                                              ,1͖��$a����/usr/local/bin/pdf.py: symbol table output.sym not found!��
Usage: /usr/local/bin/pdf.py [file_basename] > out.pdf

I do not now how to deal with it. Any help will be very much appreciated.

DingoDog commented 4 years ago

Solution was found already in 2016 by klivens https://github.com/agl/jbig2enc/issues/24#issuecomment-204697193

I use a sort of one-liner that does the same task, but without requiring modifications of pdf.py

ken-huston commented 4 years ago

Thanks for the response Dingo, it works.

I'm trying to compress a pdf composed of JPG text images. I extracted them with pdfimages and used the jbig2 compression. With the lossless option I can reduce the size of the pdf from 40 to 2,5 MB, with lossless to 5,5 MB. But I see almost no difference in quality between the two outputs (both reduce quality of the original pdf).

Am I doing something wrong, or these results are what is expected?

Thanks again.

joshuakraemer commented 1 year ago

I use a sort of one-liner that does the same task, but without requiring modifications of pdf.py

@DingoDog, would you please be so kind to share your one-line solution?

I'm trying to compress a pdf composed of JPG text images. I extracted them with pdfimages and used the jbig2 compression. With the lossless option I can reduce the size of the pdf from 40 to 2,5 MB, with lossless to 5,5 MB. But I see almost no difference in quality between the two outputs (both reduce quality of the original pdf).

Am I doing something wrong, or these results are what is expected?

@ken-huston, lossy and lossless compressions are expected to look similar, because the lossy jbig compression is good at preserving visual quality. You should be aware though that the lossy compression can also lead to letter substitutions (see https://en.wikipedia.org/wiki/JBIG2#Disadvantages). Always check your results when using lossy compression.

I assume your sources are grayscale or colored jpg files. You might be able to improve final quality and file size by using a different program to convert the jpg files to black and white image files first. I often use ImageMagick with OTSU binarization, e.g.:

magick in.jpg -auto-threshold OTSU out.pbm

agl / jbig2enc

Error on lossless compression #69