Closed GoogleCodeExporter closed 9 years ago
Thank you for the bug report and the patch.
I'm hesitating to accept the patch, because it makes pdfsizeopt too permissive,
and I don't want to pdfsizeopt to accept certain kinds of incorrect PDFs. It
would help a lot if you could post an example PDF which you think pdfsizeopt
should accept.
Original comment by pts...@gmail.com
on 13 Oct 2012 at 12:06
Thanks for looking at the patch. The person who sent me the file saw my name
in some patches. I suggested that he send the file to you. In any case, I am
attaching a new patch that is more careful. In one of the places, instead of
allowing any duplicate, it permits only /ID. In the other places, instead of
continuing silently, it prints a warning to stderr similar the the message that
it used to raise.
I have a log below that shows the warnings. If you want, if you send me a
patch that prints more information, I can run it and let you know what happens.
The file has Creator "Adobe Acrobat 8.1 Combine Files", Producer "Acrobat
9.3.1", Optimized "no", PDF version "1.6".
Object 5 starts <</ArtBox[42.5197 42.5197 496.063 722.834]
Object 6 starts <</Filter/FlateDecode/Length 619>>stream
Object 3251 starts <</Length 3645/Subtype/XML/Type/Metadata>>stream endstream
Object 11017 starts
<</Author(Client1)/CreationDate(D:20120910153803+02'00')/Creator(Adobe Acrobat
8.1 Combine Files)
and another object has stream with /Info 11017 0 R.
info: This is pdfsizeopt.py rUNKNOWN size=315564.
info: using Java for Multivalent: /usr/bin/java
info: loading PDF from: 17MB.pdf
info: loaded PDF of 16900425 bytes
info: using Ghostscript gs: GPL Ghostscript 9.06 (2012-08-08)
info: decompressing 40 bytes with Ghostscript /Filter/FlateDecode/DecodeParms
<</Columns 5/Predictor 12>>
info: decompressing 9536 bytes with Ghostscript /Filter/FlateDecode/DecodeParms
<</Columns 6/Predictor 12>>
warning: duplicate obj 5 in xref stream
warning: duplicate obj 6 in xref stream
warning: duplicate obj 3251 in xref stream
warning: duplicate obj 11017 in xref stream
warning: duplicate /ID in xref streams
info: found 11039 obj offsets and 364 obj streams in xref stream
warning: missing offset for xref stream obj 11408
warning: missing xref obj stream 11406
warning: missing xref obj stream 11407
info: separated to 10676 objs + xref + trailer
info: found 0 Type1 fonts loaded
info: found 34 Type1C fonts loaded
info: writing Type1CParser (73664 font bytes) to: pso.conv.parse.tmp.ps
info: executing Type1CParser with Ghostscript: gs -q -dNOPAUSE -dBATCH
-sDEVICE=nullpage -sDataFile=pso.conv.parsedata.tmp.ps -f pso.conv.parse.tmp.ps
Type1CParser: using interpreter GPL Ghostscript 906 20120808
Type1CParser: all OK
Original comment by william.bader@gmail.com
on 13 Oct 2012 at 2:26
Attachments:
Thank you very much for the modified and restricted patch.
Without an example PDF I don't have enough information to decide whether the
patch is an improvement in the general case. (It's definitely an improvement
for this specific PDF.) So if you can't attach an example PDF, I'm ready to
apply your patch, but the functionality could be enabled by a command-line flag
(--do-permissive-obj-parsing) disabled by default. Would this work for you?
Original comment by pts...@gmail.com
on 13 Oct 2012 at 11:46
It is a file that someone sent me. I do not need it to work, and I have asked
him to send the file to you. Since he apparently made the PDF with a recent
Adobe product, I suspect that other people will have the same problem. Maybe
it is better to wait until someone else who is willing to send a PDF has the
problem.
Original comment by william.bader@gmail.com
on 14 Oct 2012 at 2:24
I have permission to send you the PDF privately for the purpose of checking the
patches. Is that OK?
William
Original comment by william.bader@gmail.com
on 14 Oct 2012 at 7:13
Thank you very much for the detailed bug report, the follow-up information and
the several helpful patches.
Based on the provided example PDF I diagnosed the problem, identified several
bugs in the xref stream parsing code of pdfsizeopt, and fixed them r220. Please
download the latest pdfsizeopt.py and check if it works correctly. (It works
for me.)
It turned out that the example PDF was correct, but pdfsizeopt was parsing it
incorrectly when both xref streams and /Prev references were involved. I've
read the relevant sections (3.4.5 and 3.4.7) of the PDF 1.7 reference again,
and modified pdfsizeopt so that now it works according to the specification.
Original comment by pts...@gmail.com
on 14 Oct 2012 at 10:38
Thanks, 220 works for me.
Regards, William
Original comment by william.bader@gmail.com
on 15 Oct 2012 at 12:22
Original issue reported on code.google.com by
william.bader@gmail.com
on 12 Oct 2012 at 8:16Attachments: