PhilterPaper / Perl-PDF-Builder

Extended version of the popular PDF::API2 Perl-based PDF library for creating, reading, and modifying PDF documents
https://www.catskilltech.com/FreeSW/product/PDF%2DBuilder/title/PDF%3A%3ABuilder/freeSW_full
Other
6 stars 7 forks source link

[RT 134957] Problem with IndexedColor containing "\n" sequence #157

Closed PhilterPaper closed 3 years ago

PhilterPaper commented 3 years ago

From: | stu-github@spacehopper.org To: | bug-PDF-API2@rt.cpan.org Date: | Tue, 06 Apr 2021 17:13:00 -0400 Subject: | Problem with IndexedColor containing "\n" sequence

I have a problem when PDF::API2 processes a file which includes an IndexedColor image where the /Indexed line includes \n:

00000000  5b 20 2f 49 6e 64 65 78  65 64 20 2f 44 65 76 69  |[ /Indexed /Devi|
00000010  63 65 47 72 61 79 20 32  35 35 20 28 00 ff fd f3  |ceGray 255 (....|
00000020  f9 fb fa f4 f6 fe f1 f5  f2 fc f7 f8 ee ea e9 ed  |................|
00000030  e6 d1 cc d4 db bf 84 53  3f 38 25 2c 3e 44 45 3d  |.......S?8%,>DE=|
00000040  2f 2a 32 3a 6c a0 aa c7  d2 b7 c8 d8 be c1 de ec  |/*2:l...........|
00000050  f0 ef e2 86 68 55 33 0e  12 5c 66 07 5c 6e 01 5c  |....hU3..\f.\n.\|
00000060  74 06 04 5c 62 02 35 10  30 70 6a 71 78 93 cb c5  |t..\b.5.0pjqx...|
00000070  5f 19 62 6f 73 6d 5d b2  e4 bc e0 7c bb bd 85 65  |_.bosm]....|...e|
00000080  75 67 43 11 1a 4b 21 0f  8d a3 50 16 17 48 b6 eb  |ugC..K!...P..H..|
00000090  b8 61 e3 ce cf d9 36 40  2d 5c 5c 60 cd ac 94 88  |.a....6@-\\`....|
000000a0  5c 72 14 41 82 e5 96 5c  28 c3 97 57 83 ba 22 a7  |\r.A...\(..W..".|
000000b0  98 d6 52 05 b0 64 a6 03  b3 24 15 e8 59 27 31 c6  |..R..d...$..Y'1.|
000000c0  df 3c 5b 8f 4c 9b d3 2b  a2 18 b5 7a 1c a4 a1 da  |.<[.L..+...z....|
000000d0  91 8e 34 8c 9d 72 7e dd  9a 89 a5 37 ab 1e 8b 7b  |..4..r~....7...{|
000000e0  9c 5e d7 74 1b 1d 1f 26  47 b9 4a c4 d5 42 0b b4  |.^.t...&G.J..B..|
000000f0  6e 95 4f 5a 99 13 87 c0  69 b1 92 e1 dc e7 39 af  |n.OZ....i.....9.|
00000100  4e 9e d0 ca a8 5c 29 54  9f 3b 8a 77 a9 7d 7f 79  |N....\)T.;.w.}.y|
00000110  6b 81 20 4d 63 90 ad ae  76 58 c2 80 51 2e 46 c9  |k. Mc...vX..Q.F.|
00000120  23 56 66 49 29 20 0a                              |#VfI) .|

After processing with PDF::API2 the output file looks like this:

62 0 obj [ /Indexed /DeviceGray 255 <00FFFDF3F9FBFAF4F6FEF1F5F2FCF7F8EEEAE9EDE6D1CCD4DBBF84533F38252C3E44453D2F2A323A6CA0AAC7D2B7C8D8BEC1DEECF0EFE2866855330E120C07

010906040802351030706A717893CBC55F19626F736D5DB2E4BCE07CBBBD8565756743111A4B210F8DA350161748B6EBB861E3CECFD936402D5C60CDAC94880D144182E59628C3975783BA22A798D65205B064A603B32415E8592731C6DF3C5B8F4C9BD32BA218B57A1CA4A1DA918E348C9D727EDD9A89A537AB1E8B7B9C5ED7741B1D1F2647B94AC4D5420BB46E954F5A991387C069B192E1DCE739AF4E9ED0CAA829549F3B8A77A97D7F796B81204D6390ADAE7658C280512E46C923566649> ] endobj

i.e. the \n has not been replaced with 0A as should have been done, but a newline was output instead. The string fed to as_pdf in PDF/API2/Basic/PDF/String.pm is correct, the problem is introduced in the output translation to hex. This diff fixes it for me, though I'm not 100% sure it's correct:

Index: lib/PDF/API2/Basic/PDF/String.pm
--- lib/PDF/API2/Basic/PDF/String.pm.orig
+++ lib/PDF/API2/Basic/PDF/String.pm
@@ -192,7 +192,7 @@ sub as_pdf {
     }
     else {
         if ($str =~ m/[^\n\r\t\b\f\040-\176\200-\377]/oi) {
-            $str =~ s/(.)/sprintf('%02X', ord($1))/oge;
+            $str =~ s/(.|\n)/sprintf('%02X', ord($1))/oge;
             return "<$str>";
         }
         else {

I'm very sorry but I can't share the actual file I'm using.

PhilterPaper commented 3 years ago

Tue Apr 06 18:27:26 2021 PMPERRY@cpan.org - Correspondence added

Do you know if this happens on any particular Operating System and not others (e.g., only on Unix-Linux and not MSDOS/Windows)? What sort of image file are you starting with: GIF, PNG, etc.? The image file itself contains a literal backslash and n (x5C x6E), or is this after some level of processing? This sounds like it might be a problem with the original data being read in non-binary (ASCII) mode.

PhilterPaper commented 3 years ago

Steve says he patched this in PDF::API2, but when I applied the patches to string.pm in PDF::Builder, it broke the new test added to string.t. I have reported this.

PhilterPaper commented 3 years ago

Turned out to be an overlooked PDF::API2 in the new t-tests. Seems to work fine now, so closing.