fsiler / pdfsizeopt

Automatically exported from code.google.com/p/pdfsizeopt
0 stars 0 forks source link

Ghostscript fails to uncompress stream on Windows #58

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. pdfsizeopt.py Pages1-7.pdf

What is the expected output? What do you see instead?
I expect it not to fail.  I realize that this may be a gs problem but I'm not 
sure what else to try.

This is the output I get.
C:\Users\x991808\Desktop\PDFSizeOpt>pdfsizeopt.py Pages1-7.pdf
info: This is pdfsizeopt.py rUNKNOWN size=306026.
info: loading PDF from: Pages1-7.pdf
info: loaded PDF of 257118 bytes
info: decompressing 36 bytes with Ghostscript /Filter/FlateDecode/DecodeParms 
<</Columns 3/Predictor
 12>>
Traceback (most recent call last):
  File "C:\Users\x991808\Desktop\PDFSizeOpt\pdfsizeopt.py", line 7594, in <module>
    main(sys.argv)
  File "C:\Users\x991808\Desktop\PDFSizeOpt\pdfsizeopt.py", line 7564, in main
    ).Load(file_name)
  File "C:\Users\x991808\Desktop\PDFSizeOpt\pdfsizeopt.py", line 3336, in Load
    data, do_ignore_generation_numbers=self.do_ignore_generation_numbers)
  File "C:\Users\x991808\Desktop\PDFSizeOpt\pdfsizeopt.py", line 3655, in ParseUsingXref
    xref_ofs, match)
  File "C:\Users\x991808\Desktop\PDFSizeOpt\pdfsizeopt.py", line 3488, in ParseUsingXrefStream
    w0, w1, w2, index, xref_data = trailer_obj.GetAndClearXrefStream()
  File "C:\Users\x991808\Desktop\PDFSizeOpt\pdfsizeopt.py", line 1357, in GetAndClearXrefStream
    xref_tuple = self.GetXrefStream()
  File "C:\Users\x991808\Desktop\PDFSizeOpt\pdfsizeopt.py", line 1343, in GetXrefStream
    xref_data = self.GetUncompressedStream()
  File "C:\Users\x991808\Desktop\PDFSizeOpt\pdfsizeopt.py", line 2218, in GetUncompressedStream
    gs_defilter_cmd)
AssertionError: Ghostscript decompression failed: gswin32c -dNODISPLAY -q 
-sINFN=pso.filter.tmp.bin
-c '/i INFN(r)file<</CloseSource true /Intent 2/Filter /FlateDecode/DecodeParms 
<</Columns 3/Predict
or 12>>>>/ReusableStreamDecode filter def /o(%stdout)(w)file def/s 4096 string 
def {i s readstring e
xch o exch writestring not{exit}if}loop o closefile quit'

What version of the product are you using? On what operating system?
The version is the one in the source control.  Windows7.

Please provide any additional information below.
Using ghostscript 8.53, tried 9.05 to no avail.
Using sam2p.exe 0.49, as well as the exe's in the image-decode-win32.zip on the 
sam2p site.
Using pngout.exe from 7/2/2011
Using python 2.7
Using Multivalent20060102.jar, tried Multivalent20091027.jar

Thanks,
Darren

Original issue reported on code.google.com by fdnc...@gmail.com on 21 Jun 2012 at 5:53

Attachments:

GoogleCodeExporter commented 9 years ago
Thank you for reporting this problem, and thank you for sending a detailed bug 
report.

Using the attached file Pages1-7.pdf I could identify and fix (in r194) an xref 
object parsing bug.

Based on the information provided I could diagnose and fix (in r195) a 
Windows-only bug (GetUncompressedStream was calling Ghostscript incorrenctly).

Please download the newest pdfsizeopt, and run the following command:

  pdfsizeopt.py --use-multivalent=no --do-optimize-images=no Pages1-7.pdf

For me it succeeds and prints this on Linux:

info: This is pdfsizeopt.py r195 size=309327.
info: loading PDF from: Pages1-7.pdf
info: loaded PDF of 257118 bytes
info: using Ghostscript gs: GPL Ghostscript 8.71 (2010-02-10)
info: decompressing 36 bytes with Ghostscript /Filter/FlateDecode/DecodeParms 
<</Columns 3/Predictor 12>>
info: decompressing 97 bytes with Ghostscript /Filter/FlateDecode/DecodeParms 
<</Columns 5/Predictor 12>>
info: found 43 obj offsets and 3 obj streams in xref stream
info: separated to 38 objs + xref + trailer
info: found 0 Type1 fonts loaded
info: found 0 Type1C fonts loaded
info: eliminated 6 duplicate objs
info: eliminated 2 unused objs in 2 classes
info: saving PDF with 30 objs to: Pages1-7.pso.pdf
info: generated object stream of 560 bytes in 21 objects (12%)
info: generated 253953 bytes (99%)

If that command doesn't work for you, please reply (and include the full 
output).

If that command works for you, then you can remove the flags 
--use-multivalent=no and --do-optimize-images=no one-by-one. If removing the 
flags makes it fail, please open  another issue about that.

Original comment by pts...@gmail.com on 25 Jun 2012 at 2:55

GoogleCodeExporter commented 9 years ago
Thanks for your help.

I get the same output you do when i don't use multivalent and i don't optimize 
images however when multivalent=yes and optimize-images=yes I get this error 
string:

"Error in findFileFormatStream: failed to read first 12 bytes of file"

and the program keeps running.  I'm assuming this is an error with jbig2.exe 
because the PDF file that is created has 7 pages but i get an acrobat error 
"Insufficient data for an image" and all pages are blank.  I see that error 
string in Leptonica but the function looks pretty simple so I'm not sure why 
it's failing.

Attached is the log and pdf file.

Original comment by fdnc...@gmail.com on 25 Jun 2012 at 5:48

Attachments:

GoogleCodeExporter commented 9 years ago
One last note.  I just finished compiling Adam Langley's jbig2 encoder with 
vs2010 on my system.  That got rid of the "Error in findFileFormatStream" 
problems but the PDF still fails to open with the "Insufficient blah blah" 
error.  Not quite sure where to go to from here.

Thanks,
Darren

Original comment by fdnc...@gmail.com on 25 Jun 2012 at 6:31

GoogleCodeExporter commented 9 years ago
Please open a new issue, attach the original PDF (again), the PDF generated by 
pdfsizeopt+jbig2, and the jbig2.exe you use. Don't forget to include the 
console output messages of pdfsizeopt. I'll start by trying to reproduce the 
problem on Linux.

Original comment by pts...@gmail.com on 25 Jun 2012 at 8:36