dylanroy / pdfsizeopt

Automatically exported from code.google.com/p/pdfsizeopt
0 stars 0 forks source link

Missing characters #2

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
(Please select a label starting with Optimize- in ``Labels:'' below, and
remove this paragraph.)

What command do you run to optimize the PDF?
user@ubuntu804server:~/pdfsizeopt$ ./pdfsizeopt.py --use-pngout=false
--use-multivalent=false --use-jbig2=false Bishop-thesis.pdf

What does pdfsizeopt display when running the command above?
info: This is pdfsizeopt.py r91.
info: loading PDF from: Bishop-thesis.pdf
info: loaded PDF of 272392 bytes
info: separated to 141 objs
info: found 9 Type1 fonts loaded
info: writing Type1CConverter (227208 font bytes) to: pso.conv.tmp.ps
info: executing Type1CConverter with Ghostscript: gs -q -dNOPAUSE -dBATCH
-sDEVICE=pdfwrite -dPDFSETTINGS=/printer
-dColorConversionStrategy=/LeaveColorUnchanged
-sOutputFile=pso.conv.tmp.pdf -f pso.conv.tmp.ps
Type1CConverter: converting font /JESDUO+MinionPro-It to /Obj0000000121
Type1CConverter: converting font /FJYGHI+MinionPro-ItSubh to /Obj0000000123
Type1CConverter: converting font /QPGEBD+MinionPro-MediumCapt to /Obj0000000125
Type1CConverter: converting font /VMUNVT+MinionPro-Regular to /Obj0000000127
Type1CConverter: converting font /KUASCT+MinionPro-Semibold to /Obj0000000129
Type1CConverter: converting font /PAMMIA+MnSymbol12 to /Obj0000000131
Type1CConverter: converting font /FJMZYA+CMSS12 to /Obj0000000133
Type1CConverter: converting font /HSDSIM+EUSM10 to /Obj0000000135
Type1CConverter: converting font /ZVUSPS+NimbusSanL-Regu to /Obj0000000137
Type1CConverter: all OK
info: loading PDF from: pso.conv.tmp.pdf
info: loaded PDF of 25360 bytes
info: separated to 46 objs
info: found 9 fonts in GS output
info: optimized total Type1 font size 227324 to Type1C font size 13866 (6%)
info: optimized Type1 font XObject 129,128: new size=2811 (7%)
info: optimized Type1 font XObject 131,130: new size=1139 (5%)
info: optimized Type1 font XObject 133,132: new size=640 (9%)
info: optimized Type1 font XObject 135,134: new size=792 (31%)
info: optimized Type1 font XObject 137,136: new size=2286 (39%)
info: optimized Type1 font XObject 121,120: new size=1011 (2%)
info: optimized Type1 font XObject 123,122: new size=1835 (4%)
info: optimized Type1 font XObject 125,124: new size=913 (10%)
info: optimized Type1 font XObject 127,126: new size=4996 (12%)
info: found 13 Type1C fonts loaded
info: writing Type1CParser (16146 font bytes) to: pso.conv.parse.tmp.ps
info: executing Type1CParser with Ghostscript: gs -q -dNOPAUSE -dBATCH
-sDEVICE=nullpage -sDataFile=pso.conv.parsedata.tmp.ps -f pso.conv.parse.tmp.ps
Type1CParser: all OK
info: parsed 13 Type1C fonts
info: merged fonts ['/QSDUDL+MinionPro-It', '/JESDUO+MinionPro-It'],
reduced char count from 5  to 4 (80%)
info: merged fonts ['/PBRRGV+MinionPro-MediumCapt',
'/QPGEBD+MinionPro-MediumCapt'], reduced char count from 6  to 5 (83%)
info: writing Type1CGenerator (33202 font bytes) to: pso.conv.gen.tmp.ps
info: executing Type1CGenerator with Ghostscript: gs -q -dNOPAUSE -dBATCH
-sDEVICE=pdfwrite -dPDFSETTINGS=/printer
-dColorConversionStrategy=/LeaveColorUnchanged
-sOutputFile=pso.conv.gen.tmp.pdf -f pso.conv.gen.tmp.ps
Type1CGenerator: all OK
info: loading PDF from: pso.conv.gen.tmp.pdf
info: loaded PDF of 26168 bytes
info: separated to 60 objs
info: found 11 fonts loaded
info: optimized Type1C fonts to size 18231 (92%)
info: eliminated 8 duplicate objs
info: eliminated 2 unused objs in 2 classes
info: saving PDF with 130 objs to: Bishop-thesis.pso.pdf
info: generated 49492 bytes (18%)

What's wrong with the optimized PDF?
There are missing characters. For example the \mathcal{D} and \mathsf{T} on
the first page.

What should be there in the optimized PDF instead?
The same characters as in the original pdf.

This was as far as I could easily reduce this testcase. 

Original issue reported on code.google.com by lev.bishop on 23 Oct 2009 at 7:22

Attachments:

GoogleCodeExporter commented 8 years ago
Thank you for the detailed bugreport. Unfortunately I could not reproduce the 
bug:
for me Ghostscript 8.54 and Ghostscript 8.61 both produce a correct \mathcal{D} 
and
\mathsf{T} in Bishop-thesis.psom.pdf. See the attached output of Ghostscript 
8.54.

* Is the attached Bishop-thesis.psom.pdf correct?
* Please sync to r92 and rerun the conversion, and copy-paste its output. That 
one
reports the Ghostscript version number (Type1CConverter: using interpreter ...).
* If your Ghostscript version number is less than 8.54, please upgrade (sudo 
apt-get
install gs-gpl) immediately. For best results, please upgrade to at least 8.61.

Original comment by pts...@gmail.com on 25 Oct 2009 at 9:04

Attachments:

GoogleCodeExporter commented 8 years ago
Thanks for investigating. 

The Bishop-thesis.psom.pdf that you attached is NOT correct.
I attach a screenshot of viewing Bishop-thesis.psom.pdf in windows adobe reader 
9.2.0
side-by-side with the viewing the original file. (The same is true if I use
sumatraPDF (SVN version r1437) or acrobat professional 7.1.0.)

I am using ghostscript 8.61 according to:
user@ubuntu804server:~/pdfsizeopt$ gs --version
8.61

Here is the output from the updated pdfsizeopt:
info: This is pdfsizeopt.py r92.
info: loading PDF from: Bishop-thesis.pdf
info: loaded PDF of 272392 bytes
info: separated to 141 objs
info: found 9 Type1 fonts loaded
info: writing Type1CConverter (227208 font bytes) to: pso.conv.tmp.ps
info: executing Type1CConverter with Ghostscript: gs -q -dNOPAUSE -dBATCH
-sDEVICE=pdfwrite -dPDFSETTINGS=/printer
-dColorConversionStrategy=/LeaveColorUnchanged -sOutputFile=pso.conv.tmp.pdf -f
pso.conv.tmp.ps
Type1CConverter: using interpreter GPL Ghostscript 861 20071121
Type1CConverter: converting font /JESDUO+MinionPro-It to /Obj0000000121
Type1CConverter: converting font /FJYGHI+MinionPro-ItSubh to /Obj0000000123
Type1CConverter: converting font /QPGEBD+MinionPro-MediumCapt to /Obj0000000125
Type1CConverter: converting font /VMUNVT+MinionPro-Regular to /Obj0000000127
Type1CConverter: converting font /KUASCT+MinionPro-Semibold to /Obj0000000129
Type1CConverter: converting font /PAMMIA+MnSymbol12 to /Obj0000000131
Type1CConverter: converting font /FJMZYA+CMSS12 to /Obj0000000133
Type1CConverter: converting font /HSDSIM+EUSM10 to /Obj0000000135
Type1CConverter: converting font /ZVUSPS+NimbusSanL-Regu to /Obj0000000137
Type1CConverter: all OK
info: loading PDF from: pso.conv.tmp.pdf
info: loaded PDF of 25360 bytes
info: separated to 46 objs
info: found 9 fonts in GS output
info: optimized total Type1 font size 227324 to Type1C font size 13866 (6%)
info: optimized Type1 font XObject 129,128: new size=2811 (7%)
info: optimized Type1 font XObject 131,130: new size=1139 (5%)
info: optimized Type1 font XObject 133,132: new size=640 (9%)
info: optimized Type1 font XObject 135,134: new size=792 (31%)
info: optimized Type1 font XObject 137,136: new size=2286 (39%)
info: optimized Type1 font XObject 121,120: new size=1011 (2%)
info: optimized Type1 font XObject 123,122: new size=1835 (4%)
info: optimized Type1 font XObject 125,124: new size=913 (10%)
info: optimized Type1 font XObject 127,126: new size=4996 (12%)
info: found 13 Type1C fonts loaded
info: writing Type1CParser (16146 font bytes) to: pso.conv.parse.tmp.ps
info: executing Type1CParser with Ghostscript: gs -q -dNOPAUSE -dBATCH
-sDEVICE=nullpage -sDataFile=pso.conv.parsedata.tmp.ps -f pso.conv.parse.tmp.ps
Type1CParser: using interpreter GPL Ghostscript 861 20071121
Type1CParser: all OK
info: parsed 13 Type1C fonts
info: merged fonts ['/QSDUDL+MinionPro-It', '/JESDUO+MinionPro-It'], reduced 
char
count from 5  to 4 (80%)
info: merged fonts ['/PBRRGV+MinionPro-MediumCapt', 
'/QPGEBD+MinionPro-MediumCapt'],
reduced char count from 6  to 5 (83%)
info: writing Type1CGenerator (33202 font bytes) to: pso.conv.gen.tmp.ps
info: executing Type1CGenerator with Ghostscript: gs -q -dNOPAUSE -dBATCH -sDEVI
CE=pdfwrite -dPDFSETTINGS=/printer 
-dColorConversionStrategy=/LeaveColorUnchanged
-sOutputFile=pso.conv.gen.tmp.pdf -f pso.conv.gen.tmp.ps
Type1CGenerator: using interpreter GPL Ghostscript 861 20071121
Type1CGenerator: all OK
info: loading PDF from: pso.conv.gen.tmp.pdf
info: loaded PDF of 26168 bytes
info: separated to 60 objs
info: found 11 fonts loaded
info: optimized Type1C fonts to size 18231 (92%)
info: eliminated 8 duplicate objs
info: eliminated 2 unused objs in 2 classes
info: saving PDF with 130 objs to: Bishop-thesis.pso.pdf
info: generated 49492 bytes (18%)

As a perhaps useful datpoint: if I try to open the file on windows using gsview 
then
I get an error message:
GSview 4.9 2007-11-18
GPL Ghostscript 8.63 (2008-08-01)
Copyright (C) 2008 Artifex Software, Inc.  All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Scanning PDF file
   **** File has an unbalanced >> (close dictionary).
   **** Incorrect object count in object stream.
Error: /rangecheck in resolveobjectstream
Operand stack:
   PageCount   37768   4   6   --dict:7/15(L)--   92   --nostringval--   false  
--nostringval--   --dict:0/0(L)--   --dict:4/4(L)--   --dict:4/4(L)--  
--dict:2/2(L)--   --dict:4/4(L)--   --dict:5/5(L)--   --dict:1/1(L)--  
--dict:7/7(L)--   --dict:7/7(L)--   --dict:9/9(L)--   --dict:8/8(L)--  
--dict:12/12(L)--   --dict:11/11(L)--   --dict:11/11(L)--   --dict:4/4(L)--  
--dict:3/3(L)--   --dict:3/3(L)--   --dict:9/9(L)--   --nostringval--   Type   
Font 
 Subtype   Type1   BaseFont   FJMZYA+CMSS12   FontDescriptor   --nostringval--  
ToUnicode   --nostringval--   FirstChar   84   LastChar   84      Widths  
--nostringval--   --dict:9/9(L)--   --dict:12/12(L)--   --dict:2/2(L)--  
--dict:11/11(L)--   --dict:2/2(L)--   --dict:1/1(L)--   --dict:2/2(L)--  
--dict:1/1(L)--   --dict:2/2(L)--   --dict:1/1(L)--   --dict:2/2(L)--  
--dict:1/1(L)--   --dict:2/2(L)--   --dict:4/4(L)--   --dict:1/1(L)--  
--dict:1/1(L)--   --dict:6/6(L)--   --dict:4/4(L)--   --dict:1/1(L)--  
--dict:9/9(L)--   --dict:9/9(L)--   --dict:9/9(L)--   --dict:9/9(L)--  
--dict:9/9(L)--   --dict:9/9(L)--   --dict:9/9(L)--   --dict:9/9(L)--  
--nostringval--   --dict:5/5(L)--   --dict:1/1(L)--   --dict:1/1(L)--  
--dict:9/9(L)--   --dict:9/9(L)--   --dict:2/2(L)--   --dict:3/3(L)--  
--dict:8/8(L)--   --dict:9/9(L)--   --dict:9/9(L)--   --dict:8/8(L)--  
--dict:12/12(L)--   --dict:3/3(L)--   --dict:3/3(L)--   --dict:12/12(L)--  
--dict:3/3(L)--   --nostringval--   --nostringval--   --nostringval--  
--dict:2/2(L)--   --nostringval--   --dict:2/2(L)--   --nostringval--  
--nostringval--   --dict:2/2(L)--   --nostringval--   --dict:2/2(L)--  
--nostringval--   --nostringval--   --dict:2/2(L)--   --nostringval--  
--dict:2/2(L)--   --nostringval--   --nostringval--   --nostringval--  
--nostringval--   --nostringval--   --dict:11/11(L)--   --dict:11/11(L)--  
--dict:11/11(L)--   --dict:11/11(L)--   --dict:2/2(L)--   --dict:2/2(L)--  
--dict:2/2(L)--   --dict:1/1(L)--
Execution stack:
   %interp_exit   .runexec2   --nostringval--   --nostringval--   --nostringval--   2
  %stopped_push   --nostringval--   --nostringval--   false   1   %stopped_push  
1905   1   3   %oparray_pop   1904   1   3   %oparray_pop   1888   1   3  
%oparray_pop   1771   1   3   %oparray_pop   --nostringval--   %errorexec_pop  
.runexec2   --nostringval--   --nostringval--   --nostringval--   2   
%stopped_push 
 --nostringval--   --nostringval--   --nostringval--   --nostringval--  
--nostringval--   --nostringval--   --nostringval--
Dictionary stack:
   --dict:1158/1684(ro)(G)--   --dict:1/20(G)--   --dict:112/200(L)--  
--dict:106/127(ro)(G)--   --dict:275/300(ro)(G)--   --dict:20/25(L)--
Current allocation mode is local
Warning: EPS file must not use /quit
Warning: EPS file must not use /serverdict
Warning: EPS file must not use /serverdict

Original comment by lev.bishop on 25 Oct 2009 at 9:39

Attachments:

GoogleCodeExporter commented 8 years ago
Thank you for sending this detailed and specific bug report. I can confirm that 
the
Bishop-thesis.psom.pdf is not correct. More specifically: \mathcal{D} and 
\mathsf{T}
show up correctly when viewed in xpdf and Evince, but they are missing in 
Ghostscript
8.54, 8.61 and Acrobat Reader. This looks similar to the bug I tried to fix 
recently:
the Type1C font generated by Ghostscript had a different Encoding than the 
original font.

As a temporary fix, please run the optimization as ``pdfsizeopt.py
--do-unify-fonts=false''. This prevented the bug from happening for me, see the
attached files. If it doesn't help you, please attach both the original and the
generated PDF, and please copy-paste pdfsizeopt's output.

info: loading PDF from: Bishop-thesis.pdf
info: loaded PDF of 272392 bytes
info: separated to 141 objs
info: found 9 Type1 fonts loaded
info: writing Type1CConverter (227208 font bytes) to: pso.conv.tmp.ps
info: executing Type1CConverter with Ghostscript: gs -q -dNOPAUSE -dBATCH
-sDEVICE=pdfwrite -dPDFSETTINGS=/printer
-dColorConversionStrategy=/LeaveColorUnchanged -sOutputFile=pso.conv.tmp.pdf -f
pso.conv.tmp.ps
Type1CConverter: using interpreter GPL Ghostscript 854 20060517
Type1CConverter: converting font /JESDUO+MinionPro-It to /Obj0000000121
Type1CConverter: converting font /FJYGHI+MinionPro-ItSubh to /Obj0000000123
Type1CConverter: converting font /QPGEBD+MinionPro-MediumCapt to /Obj0000000125
Type1CConverter: converting font /VMUNVT+MinionPro-Regular to /Obj0000000127
Type1CConverter: converting font /KUASCT+MinionPro-Semibold to /Obj0000000129
Type1CConverter: converting font /PAMMIA+MnSymbol12 to /Obj0000000131
Type1CConverter: converting font /FJMZYA+CMSS12 to /Obj0000000133
Type1CConverter: converting font /HSDSIM+EUSM10 to /Obj0000000135
Type1CConverter: converting font /ZVUSPS+NimbusSanL-Regu to /Obj0000000137
Type1CConverter: all OK
info: loading PDF from: pso.conv.tmp.pdf
info: loaded PDF of 25351 bytes
info: separated to 46 objs
info: found 9 fonts in GS output
info: optimized total Type1 font size 227324 to Type1C font size 13870 (6%)
info: optimized Type1 font XObject 129,128: new size=2812 (7%)
info: optimized Type1 font XObject 131,130: new size=1144 (5%)
info: optimized Type1 font XObject 133,132: new size=640 (9%)
info: optimized Type1 font XObject 135,134: new size=792 (31%)
info: optimized Type1 font XObject 137,136: new size=2286 (39%)
info: optimized Type1 font XObject 121,120: new size=1011 (2%)
info: optimized Type1 font XObject 123,122: new size=1835 (4%)
info: optimized Type1 font XObject 125,124: new size=913 (10%)
info: optimized Type1 font XObject 127,126: new size=4994 (12%)
info: eliminated 6 duplicate objs
info: writing Multivalent input PDF: pso.conv.mi.tmp.pdf
info: saving PDF with 134 objs to: pso.conv.mi.tmp.pdf
info: generated 51111 bytes (19%)
info: executing Multivalent to optimize PDF: java -cp
/home/pts/prg/pdfsizeopt/trunk/Multivalent.jar tool.pdf.Compress 
pso.conv.mi.tmp.pdf
file:/mnt/mandel/warez/tmp/pso.conv.mi.tmp.pdf, 51111 bytes
PDF 1.4, producer=pdfTeX, creator=pdfTeX
additional compression may be possible with:
         -compact
=> new length = 39032, saved 23%, elapsed time = 0 sec
info: Multivalent generated pso.conv.mi.tmp-o.pdf of 39054 bytes (76%)
info: compressed xref stream from 532 to 441 bytes (83%)
info: optimized to 38949 bytes after Multivalent (100%)
info: saving PDF to: Bishop-thesis.psom.pdf
info: generated 38949 bytes (14%)

Original comment by pts...@gmail.com on 26 Oct 2009 at 7:27

Attachments:

GoogleCodeExporter commented 8 years ago
Confirm that the glyphs are not missing if I use --do-unify-fonts=false. Thanks.

Original comment by lev.bishop on 26 Oct 2009 at 8:14