J-F-Liu / lopdf

A Rust library for PDF document manipulation.
MIT License
1.65k stars 177 forks source link

lopdf 0.28.0 breaks something. #180

Closed genusistimelord closed 2 years ago

genusistimelord commented 2 years ago

So I was updating and trying out 0.28.0 of lopdf the newest update and noticed when i merge a ton of pdf's together that I use to merge with 0.27 that it now is not a workable PDF. IDK exactly what caused the issue but i can try and look at the changes to see what it could be. But this is a major problem if it can no longer merge adobe PDF files together.

I also saw a noticeable size difference between the PDFs as well will diff them to see what might of changed. old 27,448kb new 27,433kb

genusistimelord commented 2 years ago

Ok i have figured it out image

the PDF is missing the xref which should be near the end of the file.

xref
0 1518
0000000000 65535 f 
.........................more here truncating for readability
0028076214 00000 n 
trailer
<</Root 8 0 R/Size 1518>>

Also the Root is wrong too in the new

Root in the new is

1518 0 obj
<</Root 8 0 R/Size 1519/W[1 4 2]/Index[1 9 11 13 25 14 40 19 60 19 80 13 94 28 123 13 137 14 152 19 172 19 192 13 206 22 229 1 233 1 235 2 238 4 243 27 271 57 329 18 348 17 366 18 385 45 431 18 450 29 480 21 502 64 567 18 586 43 630 18 649 43 693 18 712 45 758 18 777 43 821 18 840 31 872 18 891 31 923 18 942 31 974 18 993 17 1011 18 1030 17 1048 18 1067 22 1090 18 1109 22 1132 18 1151 36 1188 18 1207 17 1225 18 1244 31 1276 18 1295 57 1353 18 1372 22 1395 124]/Length 10199>>stream
           ó    #    ·   º         3   ‡   É   ú   ¦      i      l   -    -    <>   =r   =‰   >G   ?’   ?à   @z   @ß   A=   A–   ˜§   Q    Ô    ê   °   ±<   ±S   ²   ³Œ   ³½   ´L   Ç   V   ‡   ¸   o   Ô      2     ‹   ª   ®Á   ³~   ³•   ³   Ãç   Ãþ   ļ   Å÷   Æ(   ÆÔ   Ç9   Ç—   Çð   Ìš   Ðç   Ðþ   Ѫ   Õ¿   ÕÕ   ց   Ù½   ÙÓ   èñ   ê%   ê<   êú   ì   ì²   í^   íà   î!   îz   ó$   ý]   ýt   ’   
Æ   
Ý   ›   æ      Ä   )   ‡   à   ‹   #ü   $   $   2¯   2Ç   3u   AŠ   A¢   BP   Pr   PŠ   Q8   [e   [}   \+   i—   i¯   xÎ   z   z   zÛ   |K   |}   }0   }–   }õ   ~O   ‚û   ’Ë   ’ã   ¢   £7   £O   ¤   ¥G   ¥y   ¦8   ¦ž   ¦ý   §W   þi   
   
™   
°   
Ï   
   
   
Ü   
D   
v   
      k…   l   mG   my   n8   nž   ný   oW   Ö   ‚   <   T   (s   )¨   )À   *€   +¨   +Ú   ,   ,ó   -R   -¬   2X   6w   6Ž   7A   ;L   ;c   <   ?U   ?l   N‹   OÀ   OØ   P˜   R   R>   Rñ   SW   S¶   T   X¼   bû   c   r2   sg   s   t?   uw   u©   v\   v   w!   w{   |'   C   [   Ž   žJ   žb   Ÿ   ®Ï   ®ç   ¯š   ¾ç   ¾ÿ   Î   ÏS   Ïk   Ð+   ч   Þ    ß&   ßî   àó   áJ   %µ   '    6%   6k   7x   €   €   €¦   :   l   ‚7   ‚   ‚ü   St   SÎ   ˜n      ¡È   ¡à   ¢“   ©v   ©Ž   ªA   ­#   ­:   ¼Y   ½Ž   ½¦   ¾f   ¿z   ¿¬   À‘   À÷   ÁV   ‘Î   ’(   “   §S   ûf         -    D    ,   !´ù   #Ò!   $ß2   %ÿ   &5   &L   &4   '    ('‹   )i€   *°I   *³W   *³n   *´V   +õä   -   .Ô[   10¼   13É   13à   14È   3   r   4e€   5vO   6añ   6e=   6eT   6f<   7ÿ
   8½@   9èí   <.[   <1Œ   <1£   <2q   >''   >'Ý   >*t   >*‹   >9ª   >:ß   >:÷   >;·   ><Ó   >=   >=Ð   >>6   >>•   ?
~   ?Mä   ?N>   ?Rê   ?V™   ?V°   ?Wc   ?ZF   ?Z]   ?i|   ?j±   ?jÉ   ?k‰   ?l   ?lÏ   ?m´   ?n   ?ny   @:b   @:¼   AjÖ   B¦½   CÜ€   Cá,   CäB   CäY   Cóx   Cô­   CôÅ   Cõ…   Cö¡   CöÓ   C÷ž   Cø   Cøc   DŽ¿   D   E9?   E=ë   EB@   EBX   EC   EJT   EJl   EY‹   EZÀ   EZØ   E[˜   E\¬   E\Þ   E]à   E^)   E^ˆ   Eôä   Eõ>   G!\   H   Jz]   J     J‚   J‚1   Jƒ   MÈJ   P2-   P¾<   R?   RB   RB&   RC   S$   V)   XŸm   Z&!   Z)6   Z)M   Z*5   [   [æ÷   ]á”   ^£D   ^¦°   ^¦Ç   ^§¯   a/ì   d<   eËÅ   eÌ{   eÏm   eÏ„   eÞ£   eßØ   eßð   eà°   eáÌ   eáþ   eâÉ   eã/   e㎠  f¢   f¢÷   fæ¿   fëk   fïÕ   fïí   fð    föq   fö‰   g¨   gÝ   gõ   gµ   gÉ   gû   g à   g
F   g
¥   gÉ´   gÊ   hù:   jKà   kI   kN-   kQX   kQo   kRW   l`•   n°¸   pÝT   q–š   q™£   q™º   qšˆ   tŠ«   v ¶   vh   v   vž   vÓ   vë   v«   vÇ   vù   vÄ   v*   v‰   wCn   wCÈ   wˆú   w¦   w’u   w’   w“@   wš¨   wšÀ   w›s   wžk   wž‚   w­¡   w®Ö   w®î   w¯®   w°Â   w°ô   w±Ù   w²?   w²ž   xÞƒ   xÞÝ   yó+   {!C   |ɝ   |ÎI   |ÑZ   |Ñq   |ÒY   }Ñg   ~—¡   à9   €’   €•/   €•F   €–.   §“   ƒHû   …[Ì   ‡^±   ‡a´   ‡aË   ‡b³   ŠUO   ‹m<   ?@   s   v   v0   w   ’X“   “¹Â   ”£/   •¨U   •«j   •«   •¬i   ˜!   ˜Ä&   š¸o   œ6   œ90   œ9G   œ:/   ÉC   žïš    "    ¡g“   ¡jÎ   ¡jå   ¡k³   ¢ýÄ   ¤+   ¤Ó   ¤ê   ¤.     ¤/>   ¤/V   ¤0   ¤12   ¤1d   ¤2/   ¤2•   ¤2ô   ¥   ¥Ù   ¥`   ¥d´   ¥j;   ¥jS   ¥k   ¥q   ¥q)   ¥€H   ¥}   ¥•   ¥‚U   ¥ƒi   ¥ƒ›   ¥„€   ¥„æ   ¥…E   ¦mÐ   ¦n*   §õ¶   ©bË   ªÙ­   ªÞY   ªák   ªá‚   ªâj   ­¶¾   °{+   ³6¥   ´Óü   ´Öó   ´×
   ´×ò   ¶¹%   ¹fÇ   »@Ð   ½%-   ½(Y   ½(p   ½)X   ¾.r   ¿C   Á<\   Ã^Û   Ãb   Ãb3   Ãc   Ŷõ   Å·«   źB   źY   ÅÉx   ÅÊ­   ÅÊÅ   ÅË…   ÅÌ¡   ÅÌÓ   ÅÍž   ÅÎ   ÅÎc   ƨr   Æ¨Ì   ÆìU   Æñ   ÆõÌ   Æõä   Æö—   Æý   Æý*   ÇI   Ç
~   Ç
–   ÇV   Çj   Çœ   ǁ   Çç   ÇF   ÇëU   Çë¯   Êi   ËàŒ   Í“   Í—®   ÍšÓ   Íšê   Í›Ò   Îð7   Ðó   Ño<   ÓÁ(   ÓÄW   ÓÄn   ÓÅV   ÕÉà   ×*¤   Øæ‚   Ú~à   Úï   Ú‚   Ú‚î   ÜVº   Þˆ   àn»   âLÖ   âP     âP    âPî   ä?Î   ä@„   äC    äC7   äRV   äS‹   äS£   äTc   äU   äU±   äV|   äVâ   äWA   å˜   åò   å]Ø   åb„   ågc   åg{   åh.   åo   åo—   å~¶   åë   å€   å€Ã   å×   å‚      å‚î   åƒT   僳   æF
   æFd   çN   肺   é4X   é9   é<   é<0   é=   êI…   ëm   ì)4   ïS²   ïW   ïW%   ïX
   ñ:V   òÒÝ   óè“   õ
K   õ`   õw   õ_   ör   ÷šÍ   ùüƒ   úËI   úÎz   úΑ   úÏy   üv²   ý    ÿì   ››   žÉ   žà   ­ÿ   ¯4   ¯L   °   ±(   ±Z   ²%   ²‹   ²ê  Ä–  Äð     
Ç  í    ¸  ’  ª  )É  *þ  +  +Ö  ,ê  -  .  .g  .Æ  @r  @Ì  W%  ]^  ',  +Ø  /  /0  0  
  àâ  ÊM   Ñš    ÔÂ    ÔÙ    ÕÁ  
À  ¼t  Õz  ž  Æ  Ý  Å  ž¶  –&  öK  Ô{  ך  ×±  Ø  Èã  ‰™  Œ]  Œt  ›“  œÈ  œà     ž¼  žî  Ÿ¹      ~  P¹  Q  ”Ú  ™†  ŸX  Ÿp   #  ¥©  ¥Á  ´à  ¶  ¶-  ¶í  ¸  ¸3  ¹  ¹~  ¹Ý  j  jr  _0  —¯  F  J½  MÖ  Mí  NÕ  là  ôU  µ   {c   ~   ~¤   Œ  !Ǎ  "ˆ@  #"ö  %Hh  %K›  %K²  %ZÑ  %\  %\  %\Þ  %]ú  %^,  %^÷  %_]  %_¼  &¿  &  &^ß  &c‹  &hÿ  &i  &iÊ  &oŒ  &o¤  &~à  &ø  &€  &€Ð  &ä  &‚  &‚û  &ƒa  &ƒÀ  '>à  '?  (]ë  )¦W  +.ì  +3˜  +6Ú  +6ñ  +7Ù  ,¢W  -’0  /G  0 £  0£Ê  0£á  0¤É  2IÕ  3Ñ  4‹g  6/
  62   627  6AV  6B‹  6B£  6Cc  6D  6D±  6E|  6Eâ  6FA  7u  7Ï  7HA  7Lí  7Rz  7R’  7SE  7Y2  7YJ  7hi  7iž  7i¶  7jv  7kŠ  7k¼  7l¡  7m  7mf  8+š  8+ô  9#’  :0  ;™  ;E  ;f  ;}  ;e  =ŽÄ  ?Ïg  @Ü0  Aƒ–  A†®  A†Å  A‡­  BÐF  D¤Å  Eÿ  EÿÑ  Fë  F  F!  FV  Fn  F.  FJ  F|  FG  F­  F  F€î  FH  FÄÈ  FÉt  FÎÑ  FÎé  FÏœ  FÓ[  FÓr  Fâ‘  FãÆ  FãÞ  Fäž  Få²  Fåä  FæÍ  Fç3  Fç’  GQt  GQÎ  HC   I*  K'á  K,  K/³  K/Ë  K>ë  K@!  K@:  K@ü  KB  KBL  KC   KC‡  KCç  KÿÉ  L $  LC   LHN  LM]  LMv  LN0  LRÆ  LRß  Laÿ  Lc5  LcN  Ld  Le%  LeX  LfH  Lf¯  Lg  M"ñ  M#L  M¢ä  OD  PÆ„  PË2  PÎw  PΏ  Pݯ  PÞå  PÞþ  PßÀ  PàÝ  Pá  Páä  PâK  Pâ«  Pãb  Pã½  Q'  Q+À  Q0  Q00  Q0ê  Q3²  Q3Ê  QBê  QD   QD9  QDû  QF  QFC  QG3  QGš  QGú  R  Rp  TdÌ  UH“  W1Z  W6  W9=  W9U  W:,  Y5ª  Y6a  Y8ô  Y9  YH,  YIb  YI{  YJ=  YKZ  YK  YLa  YLÈ  YM(  ZQè  Z•L  Z•§  ZšU  Zž(  Zž@  Zžú  Z¡Ý  Z¡õ  Z±  Z²K  Z²d  Z³&  Z´;  Z´n  Zµ^  ZµÅ  Z¶%  [ºå  [»@  \¾Ô  ]±&  ^…  ^‰Ä  ^Œç  ^Œÿ  ^Ö  _H\  _I  _K¯  _KÇ  _Zç  _\  _\6  _\ø  _^  _^H  __  __ƒ  __ã  `w?  `wš  `»  `¿¶  `Å"  `Å;  `Åõ  `Ë®  `ËÇ  `Úç  `Ü  `Ü6  `Üø  `Þ
  `Þ@  `ß0  `ß—  `ß÷  a÷S  a÷®  c
µ  d;œ  e+…  e03  e3J  e3b  e4U  fJõ  gJr  haP  iNº  iQÕ  iQí  iRà  j’Û  kvî  l‘  mÕ  mØ9  mØQ  mÙ(  o:  p|¾  p{  p“  pŽ³  pé  p  pÄ  p‘á  p’  p’è  p“O  p“¯  qBÝ  qC8  q†š  q‹H  qµ  qÎ  q‘ˆ  q•  q•§  q¤Ç  q¥ý  q¦  q¦Ø  q§í  q¨   q©  q©w  q©×  rY  rY`  rêà  sú“  t¿ì  tÄš  tÇÔ  tÇì  t×  tØB  tØ[  tÙ  tÚ:  tÚm  tÛA  tÛ¨  tÜ  uÉš  uÉõ  v
¯  v]  v  v   vÚ  vº  vÓ  v,ó  v.)  v.B  v/  v0  v0L  v1<  v1£  v2  w•  wð  xiì  yÊ“  zŠ³  za  z’‡  z’Ÿ  z“’  |úÛ  `  €Ï:  ‚1«  ‚4®  ‚4Æ  ‚5¹  ƒ¤ò  „>Ö  …¡G  …¡þ  …¥  …¥,  …´L  …µ‚  …µ›  …¶]  …·z  …·­  …¸  …¸è  …¹H  †—¢  †—ý  †Ûõ  †à£  †å•  †å®  †æh  †íá  †íú  †ý  †þP  †þi  †ÿ+  ‡ @  ‡ s  ‡c  ‡Ê  ‡*  ‡à„  ‡àß  ˆŸ?  ‰Æ  Šò  Š÷>  ŠúX  Šúp  Šûc  Œ1u  j³  Ž*Ó  NŸ  Qã  Qû  Rî  ‘&  “X  ”,l  –Ös  –Ù¡  –Ù¹  –Ú¬  ˜:m  ™±‹  ›ã  ›×p  ›Ú¶  ›ÚÎ  ›ÛÁ  œû“  ž  žÕæ  Ÿë³  Ÿî×  Ÿîï  Ÿïâ   Ð{  ¡ºD  £m  ¤¬*  ¤¯n  ¤¯†  ¤°]  ¥Ø$  ¦ø  ¦úÀ  ¦úØ  §  ø  §.  §G  §     §
&  §
Y  §-  §”  §ô  §Ú’  §Úí  ¨O  ¨"ý  ¨(6  ¨(O  ¨)    ¨->  ¨-W  ¨<w  ¨=­  ¨=Æ  ¨>ˆ  ¨?  ¨?Ð  ¨@À  ¨A'  ¨A‡  ©
%  ©
€  ©½Ò  ªH­  «B  «ð  «*  «B  «  «šg  ¬8C  ¬:ý  ¬;  ¬J5  ¬Kk  ¬K„  ¬LF  ¬Mc  ¬MŸ  ¬N  ¬NG  ¬NÏ  ¬Nú  ¬Ol  ¬O—  ¬P  ¬P>  ¬PÅ  ¬Pñ  ¬Qt  ¬Q   ¬R(  ¬RT  ¬S  ¬S/  ¬SŽ  ¬Sº  ¬T0  ¬T\  ¬Tê  ¬U  ¬U  ¬U¬  ¬V.  ¬V[  ¬Vã  ¬W  ¬W“  ¬WÀ  ¬XE  ¬Xr  ¬Xè  ¬Y  ¬Y”  ¬YÁ  ¬Z‹  ¬Z¸  ¬["  ¬[O  ¬[Ð  ¬[ý  ¬\v  ¬\£  ¬]%  ¬]R  ¬]Ê  ¬]÷  ¬^x  ¬^¥  ¬_  ¬_J  ¬_Ë  ¬_ø  ¬`p  ¬`  ¬a  ¬aK  ¬aà  ¬að  ¬bq  ¬bž  ¬c  ¬cC  ¬cÄ  ¬cñ  ¬di  ¬d–  ¬e  ¬eD  ¬e¼  ¬eé  ¬fj  ¬f—  ¬g  ¬g<  ¬g½  ¬gê  ¬hc  ¬h  ¬i  ¬i?  ¬i¸  ¬iå  ¬jg  ¬j”  ¬k
  ¬k:  ¬k¼  ¬ké  ¬lb  ¬l  ¬m  ¬m@  ¬m¹  ¬mç  ¬ni  ¬n—  ¬o  ¬o>  ¬oÀ  ¬oî  ¬pg  ¬p•  ¬q  ¬qE  ¬q¾  ¬qì  ¬rn  ¬rœ  ¬s  ¬sC  ¬sÅ  ¬só  ¬tl  ¬tš  ¬u  ¬uJ  ¬uà  ¬uñ  ¬ve  ¬v“  
endstream 
endobj

the old is just

<</Root 8 0 R/Size 1518>>

after the xref data.

So the break Occurs in commit a3f531b PR by @ralpha

@J-F-Liu I would Yank the 0.28.0 release from crates.io till this issue is fixed.

This causes Adobe PDF reader, and Envice to not load the pdf displaying a error message that it is broken due to the missing data needed. But works fine in PDF readers that have no features at all.

ralpha commented 2 years ago

@genusistimelord could you provide me with some pdf file that I could use for testing?

Not all PDFs need a Cross Reference Table they can also have a Cross Reference Stream. And It looks like this is what is being used here.

Before the commit lopdf could only write a Cross Reference Table. In the PR I created to possibility of a Cross Reference Stream. The PDF being loaded will keep the way it was encoded, a Table if it was a Table and a Stream if it was a Stream. In the update is set the default (when using Document::new()) to the Stream. (Maybe we want to change the default) https://github.com/J-F-Liu/lopdf/blob/850b150461245cbf7c8dd780b31c76837769a0f5/src/document.rs#L57 (this is a new (since PDF 1.5, in 2003) and more compact way of storing the info and support some other features.)

You can set the type you want to use using:

document.reference_table.cross_reference_type = XrefType::CrossReferenceTable;
// or 
document.reference_table.cross_reference_type = XrefType::CrossReferenceStream;

But I don't think this is the real issue here. I think something else went wrong. Maybe the problem is because you are linearizing the PDF, but yet the object id's are not sequential. I don't know how many file you are merging together but each gap in the object id is exactly 1 (except for one, that claims to have 3 objects, file number 14, which has 1 object in it). But if you happen to be merging 60 files together and using the merge.rs example code. Maybe removing the + 1 on this line will partly solve the problem. https://github.com/J-F-Liu/lopdf/blob/850b150461245cbf7c8dd780b31c76837769a0f5/examples/merge.rs#L75 Although I think that the file with 3 object, but actually 1, will still give you the same error.

But I think we need to add a way for linearized files to not skip over missing Object Id's. (btw this is wanted behavior in incremental PDFs, hence the reason it does this right now)

If you could provide me with some file I could make sure this is the issue and fix it.

genusistimelord commented 2 years ago

these are PDF files generated with SSRS. The plus one should not be an issue since he removed one from the count https://github.com/J-F-Liu/lopdf/blob/master/src/reader.rs#L170 which means it is set to the max_id == last_id. so you need to add 1 in order to reorder the id's for proper merging.

here are the files i tested to ensure they do break when merged witht he current setup. PDF.zip

also the way i currently merge the pdfs are in this PR i canceled because i thought it was a issue i caused and did not know it was a issue with a previous PR.

https://github.com/J-F-Liu/lopdf/pull/179/files#diff-4a0b312a600584eb42b9515b8af14b5d6cb3f0d2d17bb4b48e87f53089fe9fb8R5

genusistimelord commented 2 years ago

@ralpha I have tested the document.reference_table.cross_reference_type = XrefType::CrossReferenceTable; though I had to make a change to allow xref to be public so I could set the option. Also we should add a Document Version number check for when its set to XrefType::CrossReferenceStream making sure people can only use pdf version 1.5 and greater.

@J-F-Liu Thank you for yanking the current crate for now. that will prevent further issues till @ralpha can get the streaming side fixed. Also please take a look at my PR changes when you get a chance. As i make xref public and have updated the match statement in expand and the merge.rs example.

ralpha commented 2 years ago

Currently don't have time to fix it this week, but will take a look soon.

@genusistimelord when you changed the Xref table make sure that in:

The objects are then listed below it in both cases, ether in stream or table.